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(ii) TITLE OF INVENTION : HUMAN SEMAPHORIN L (H-SEMAL) AND 
CORRESPONDING SEMAPHORINS IN OTHER SPECIES 

(iii) NUMBER OF SEQUENCES: 44 

(iv) CORRESPONDENCE ADDRESS : 
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(2) INFORMATION FOR SEQ ID>NO:l: „ . 

(i) ^gUENCE CHARACTERISTICS : 

r/M). LENGTH: 263 6 base pairs 
Vvpf TYPE:, nucleic acid . . 
(1) STRANDEDNESS : single 
(0) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 




(1) GENERAL 



CGGGGCCACG GGATGACGCC TCCTCCGCCC GGACGTGCCG CCCCCAGCGC ACCGCGCGCC 
CGCGTCCCTG GCCCGCCGGC TCGGTTGGGG CTTCCGCTGC GGCTGCGGCT GCTGCTGCTG 
CTCTGGGCGG CCGCCGCCTC CGCCCAGGGC CACCTAAGGA GCGGACCCCG CATCTTCGCC 
GTCTGGAAAG GCCATGTAGG GCAGGACCGG GTGGACTTTG GCCAGACTGA GCCGCACACG 
GTGCTTTTCC ACGAGCCAGG CAGCTCCTCT GTGT^GGTGG GAGGACGTGG CAAGGTCTAC 
CTCTTTGACT TCCCCGAGGG CAAGAACGCA TCTGTGCGCA CGGTGAATAT CGGCTCCACA 
AAGGGGTCCT GTCTGGATAA GCGGGACTGC GAGAACTACA TCACTCTCCT GGAGAGGCGG 
AGTGAGGGGC TGCTGGCCTG TGGCACCAAC GCCCGGCACC CCAGCTGCTG GAACCTGGTG 
AATGGCACTG TGGTGCCACT TGGCGAGATG AGAGGCTACG CCCCCTTCAG CCCGGACGAG 
AACTCCCTGG TTCTGTTTGA AGGGGACGAG GTGTATTCCA CCATCCGGAA GCAGGAATAC 
AATGGGAAGA TCCCTCGGTT CCGCCGCATC CGGGGCGAGA GTGAGCTGTA CACCAGTGAT 
ACTGTCATGC AGAACCCACA GTTCATCAAA GCCACCATCG TGCACCAAGA CCAGGCTTAC 
GATGACAAGA TCTACTACTT CTTCCGAGAG GACAATCCTG ACAAGAATCC TGAGGCTCCT 
CTCAATGTGT CCCGTGTGGC CCAGTTGTGC AGGGGGGACC AGGGTGGGGA AAGTTCACTG 
TCAGTCTCCA AGTGGAACAC TTTTCTGAAA GCCATGCTGG TATGCAGTGA TGCTGCCACC 
AACAAGAACT TCAACAGGCT GCAAGACGTC TTCCTGCTCC CTGACCCCAG CGGCCAGTGG 
AGGGACACCA GGGTCTATGG TGTTTTCTCC AACCCCTGGA ACTACTCAGC CGTCTGTGTG 
TATTCCCTCG GTGACATTGA CAAGGTCTTC CGTACCTCCT CACTCAAGGG CTACCACTCA 
AGCCTTCCCA ACCCGGGGCC TGGCAAGTGC CTCCCAGACC AGCAGCCGAT ACCCACAGAG 
ACCTTCCAGG TGGCTGACCG TCACCCAGAG GTGGCGCAGA GGGTGGAGCC CATGGGGCCT 
CTGAAGACGC CATTGTTCCA CTCTAAATAC CACTAGCAGA AAGTGGCCGT TCACCGCATG 
CAAGCCAGCC ACGGGGAGAC CTTTC ATGTG CTTTACCTAA CTACAGACAG GGGCACTATC 
CACAAGGTGG TGGAACCGGG GGAGCAGGAG CACAGCTTCG CCTTCAACAT CATGGAGATC 
CAGCCCTTCC GCCGCGCGGC TGCCATCCAG ACCATGTCGC TGGATGCTGA GCGGAGGAAG 
CTGTATGTGA GCTCCCAGTG GGAGGTGAGC CAGGTGCCCC TGGACCTGTG TGAGGTCTAT. 
GGCGGGGGCT GCCACGGTTG CCTCATGTCC CGAGACCCCT ACTGCGGCTG GGACCAGGGC 
CGCTGCATCT CCATCTACAG CTCCGAACGG TCAGTGCTGC AATC CATTAA TCCAGCCGAG 
CCACACAAGG AGTGTCCCAA CCCCAAACCA GACAAGGCCC CACTGCAGAA GGTTTCCCTG 
GCCCCAAACT CTCG CTACT A CCTGAGCTGC CCCATGGAAT CCCGCCACGC CACCTACTCA 




TGGCGCCACA AGGAGAACGT GGAGCAGAGC TGCGAACCTG GTCACCAGAG CCCCAACTGC 18 00 

ATCCTGTTCA TCGAGAACCT CACGGCGCAG CAGTACGGCC ACTACTTCTG GGAGGCCCAG 1860 

GAGGGCTCCT ACTTCCGCGA GGCTCAGCAC TGGCAGCTGC TGGCCGAGGA CGGCATCATG 1920 

GCCGAGCACC TGCTGGGTCA TGCCTGTGCC CTGGCTGCCT CCCTCTGGCT GGGGGTGCTG 1980 

CCCACACTCA CTCTTGGCTT GGTGGTCCAC TAGGGCCTGG : CGAGGCTGGG CATGCCTCAG 2040 

GCTTCTGCAG CCCAGGGCAC TAGAACGTGT CACACTCAGA GCGGGCTGGC CCGGGAGCTC 2100 

CTTGCCTGCC ACTTCTTCCA GGGGACAGAA TAACCCAGTG GAGGATGCCA . GGGCTGGAGA 2160 

CGTCCAGCCG CAGGCGGCTG .CTGGGCCCCA GGTGGCGGAC GGAT.GGTGAG GGGGTGAGAA 1 2220 
TGAGGGCACC GACTGTGAAG GTGGGGCATG GATGACCGAA GACTTTATCTV TCTGGAAAAT . 228 0 

ATTTTTCAGA CTCCTCAAAC . TTGACTAAAT GCAGCGATGC TCCCAGCCCA AGAGCCCATG 2340 

GGTCGGGGAG TGGGTTTGGA TAGGAGAGCT GGGAGTCCAT CTCGACCCTG GGGCTGAGGC 2400 
CTGAGTCCTT GTGGACTCTT GGTACCCACA TTGCCTCCTT CCCCTGCCTC TCTCATGGCT : 2460 

GGGTGGCTGG TGTTCCTGAA GACCCAGGGG TAGCCTCTGT. CCAGCCCTGT CCTCTGCAGC 2520 

TCCCTCTCTG GTCCTGGGTC CCACAGGACA GGCGCCTTGC ATGTTTATTG AAGGATGTTT 2580 

GCTTTCCGGA CGGAAGGACG GAAAAAGCTC TGAAAAAAAA AAAAAAAAAA AAAAAA . * 2636 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

CGGGGCTGCG GGATGACGCC TCCTCCTCCC GGACGTGCCG CCCCCAGCGC ACCGCGCGCC 60 

v CGCGTCCTCA GCCTGCCGGC TCGGTTCGGG CTCCCGCTGC GGCTGCGGCT TCTGCTGGTG 120 

TTCTGGGTGG CCGCCGCCTC CGCCCAAGGC CACTCG AGGA GCGGACCCCG CATCTCCGCC 180 

GTCTGGAAAG GGCAGGACCA TGTGGACTTT AGCCAGCCTG AGCCACACAC CGTGCTTTTC 240 

CATGAGCCGG GCAG CTTCTC TGTCTGGGTG GGTGGACGTG GCAAGGTCTA CCACTTCAAC 300 

TTCCCCGAGG GCAAGAATGC CTCTGTGCGC ACGGTGAACA TCGGCTCCAC AAAGGGGTCC 360 



TGTCAGGACA AACAGGACTG TGGGAATTAC ATCACTCTTC TAGAAAGGCG GGGTAATGGG 420 

CTGCTGGTCT GTGGCACCAA TGCCCGGAAG CCGAGCTGCT GGAACTTGGT GAATGACAGT 480 

GTGGTGATGT CACTTGGTGA GATGAAAGGC TATGCCCCCT TCAGCCCGGA TGAGAACTCC 540 

CTGGTTCTGT TTGAAGGAGA TGAAGTGTAC TCTACCATCC GGAAGCAGGA ATACAACGGG 600 
AAGATCCCTC GGTTTCGACG: CATTCGGGGC GAGAGTGAAC TGTACACAAG^ TGATAGAGTC : 660 

ATGCAGAACC CACAGTTCAT CAAGGCCACC ATTGTGCACC AAGACCAAGC CTATGATGAT :720 

AAGATCTACT ACTTCTTCCG AGAAGACAAC CCTGACAAGA ACCGGGAGGG TCCTCTCAAT .< 780 

GTGTGCCGAG TAGCCCAGTT GTGGAGGGGG GACCAGGGTG GTGAGAGTTC GTTGTCTGTC 840 

TCCAAGTGGA ACACCTTCCT GAAAGCCATG TTGGTCTGCA GCGATGCAGC CACCAACAGG -900 

AACTTC AATC GGCTGCAAGA TGTCTTCGTG CTCCCTGACC CCAGTGGCCA- GTGGAGAGAT 960 

ACCAGGGTCT ATGGCGTTTT CTCGAACCCC TGGAACTAGT CAGCTGTCTG CGTGTATTCG 1020 

CTTGGTGACA ^TTGACAGAGT CTTCCGTACC TCATCGCTCA AAGGCTAGCA CATGGGCCTT 1080 

TCCAACCCTC GACCTGGCAT GTGCCTCCCA AAAAAGCAGC CCATAGCCAC ' AGAAACCTTC 1140 

CAGGTAGCTG ATAGTCACCC AGAGGTGGCT CAGAGGGTGG AACCTATGGG GCCCC 1195 
(2) INFORMATION FOR SEQ.ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 666 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : n/a 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: amino acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Met Thr Pro Pro Pro Pro Gly Arg Ala Ala Pro Set Ala Pro Arg AlaK 
1 5 10 15 

Arg Val Pro Gly Pro Pro Ala Arg Leu Gly Leu Pro Leu Arg Leu Arg 
20 .25 30 

Leu Leu Leu Leu Leu Trp Ala Ala Ala Ala Ser Ala Gin Gly His Leu 
35 40 45 

Arg Ser' Gly Pro Arg He Phe Ala Val Trp Lys Gly His Val Gly Gin 
50 55 60 

Asp Arg Val Asp Phe Gly Gin Thr Glu Pro His Thr Val Leu Phe His 



65 



70 



75 



80 



Glu Pro Gly Ser Ser Ser Val Trp Val Gly Gly Arg Gly Lys Val Tyr 
85 90 95 

Leu Phe Asp Phe Pro Glu Gly Lys Asn Ala Ser Val Arg Thr Val ; Asn 
100 105 110 

Tie Gly Ser /■ Thr Lys Gly Ser Cys Leu Asp Lys Arg Asp Cys Glu Asn 
115 120 125 

Tyr lie Thr Leu Leu Glu Arg Arg Ser Glu Gly Leu Leu Ala Cys Gly 
130 135 140 

Thr Asn Ala Arg His Pro Ser Cys Trp Asn Leu Val Asn Gly Thr Val 
145 150 155 160 

Val Pro Leu Gly Glu Met Arg Gly Tyr Ala Pro Phe Ser Pro Asp Glu 
165 170 175 

Asn Ser Leu Val Leu Phe Glu Gly Asp Glu Val Tyr Ser Thr lie Arg 
180 185 190 

Lys Gin Glu Tyr Asn Gly Lys lie Pro Arg Phe Arg Arg lie Arg Gly. 
195 200 205 

Glu Ser Glu Leu Tyr Thr Ser Asp Thr Val Met Gin Asn Pro Glri .Phe 1 
210 215 220 

lie Lys Ala Thr lie Val His Gin Asp Gin Ala Tyr Asp Asp Lys lie 
225 230 235 240 



Tyr Tyr Phe Phe Arg 
245 

Leu Asn Val Ser Arg 
260 

Glu Ser Ser Leu Ser 
275 

Leu Val Cys Ser Asp 
29X) 

Asp Val Phe Leu Leu 
305 

Val Tyr Gly Val Phe 
325 

Tyr Ser Leu Gly Asp 
340 

Gly Tyr His Ser Ser 
355 



Glu Asp Asn Pro Asp Lys 
250 

Val Ala Gin Leu Cys Arg 
265 

Val Ser Lys Trp Asn Thr 
280 

Ala Ala Thr Asn Lys Asn 
295 

Pro Asp Pro Ser Gly Girl 
310 315 

Ser Asri Pro Trp Asn Tyr 
330 

lie Asp Lys Val Phe Arg 
345 

Leu Pro Asn Pro Arg Pro 
360 



Asn Pro Glu Ala Pro 
255 

Gly Asp Glri Gly Gly 
270 

Phe Leu Lys Ala Met 
285 

Phe Asn Arg Leu Glri 
300 

Trp Arg Asp Thr Arg 
320 

Ser Ala Val Cys Val 
335 

Thr Ser Ser Leu Lys 
350 

Gly Lys Cys Leu Pro 
365 



Asp Gin Gin Pro lie Pro Thr Glu Thr Phe Gin Val Ala Asp Arg His 



370 375 380 

Pro Glu Val Ala Gin Arg Val Glu Pro Met Gly Pro Leu Lys Thr Pro 
385 390 395 400 

Leu Phe His Ser Lys Tyr His Tyr Gin Lys Val Ala Val His Arg Met 
405 410 415 

Gin Ala Ser His Gly Glu Thr Phe His Val Leu Tyr Leu Thr Thr Asp 
420 425 430 

Arg Gly Thr lie His Lys Val Val Glu Pro Gly Glu Gin Glu His Ser 
435 =440 445 

Phe Ala Phe Asn lie Met Glu lie Gin Pro Phe Arg Arg Ala Ala Ala 
450 455 460 

lie Gin Thr Met Ser Leu Asp Ala Glu Arg Arg Lys Leu Tyr Val Ser 
465 470 475 480 

Ser Gin Trp Glu Val Ser Gin Val Pro Leu Asp Leu Cys Glu Val Tyr 
485 490 495 

Gly Gly Gly Cys His Gly Cys Leu Met Ser Arg Asp Pro Tyr Cys Gly 
500 505 510 

Trp Asp Gin Gly Arg Cys lie Ser lie , Tyr Ser Ser Glu : Arg Ser Val. 
515 520 525 

Leu Gin Ser lie Asn Pro Ala Glu Pro His Lys. Glu Cys Pro Asn Pro 
530 , 535 540 

Lys Pro Asp Lys Ala Pro Leu Gin Lys Val Ser Leu Ala Pro Asn Ser 
545 550 555 560 

Arg Tyr Tyr ,Leu Ser Cys Pro Met Glu Ser Arg His Ala Thr Tyr Ser 
565 570 575 

Trp Arg His Lys Glu Asn Val Glu Gin Ser Cys Glu Pro Gly His Gin 
580 585 590 

Ser Pro Asn Cys -lie Leu Phe Tie Glu Asn Leu Thr Ala Gin Gin Tyr 
595 600 605 

Gly His Tyr Phe Cys Glu Ala Gin Glu Gly Ser Tyr Phe Arg Glu Ala 
610 615 620 

Gin His Trp Gin Leu Leu Pro Glu Asp Gly lie Met Ala Glu His Leu 
625 630 635 640 

Leu Gly His Ala Cys Ala Leu Ala Ala Ser Leu Trp Leu Gly Val Leu 
645 650 655 

Pro Thr Leu Thr Leu Gly Leu Leu Val His, 
660 665 



INFORMATION FOR SEQ ID NO:4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 394 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : n/a 

(D) TOPOLOGY: linear- ; 

(ii) MOLECULE TYPE: amino acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Thr Pro Pro rPro Pro Gly Arg Ala Ala Pro Ser Ala Pro Arg Ala 
1 5 10 15 

Arg Val Leu Ser Leu Pro ; Ala Arg Phe Gly Leu Pro Leu Arg Leu Arg 
20 25 30 

Leu Leu Leu Val Phe Trp Val Ala Ala Ala Ser Ala Gin Gly His Ser 
35 40 45 

Arg Ser Gly Pro Arg lie Ser Ala Val Trp Lys Gly Gin Asp His Val 
50 55 60 

Asp Phe Ser Gin Pro Glu Pro His Thr Val Leu Phe His Glu Pro Gly 
65 70 75 80 



Ser Phe Ser Val Trp Val Gly Gly 
85 

Phe Pro Glu Gly Lys Asn Ala Ser 
100 

Thr Lys Gly Ser Cys Gin Asp Lys 
115 120 

Leu Leu Glu Arg Arg Gly Asn Gly 
130 135 

Arg Lys Pro Ser Cys Trp Asn Leu 
145 150 

Leu Gly Glu Met Lys Gly Tyr Ala 
165 



Arg Gly Lys Val Tyr His Phe Asn 
90 95 

Val Arg Thr Val Asn lie Gly Ser 
105 110 

Gin Asp Cys Gly Asn Tyr lie Thr 
125 

Leu Leu Val Cys Gly Thr Asn Ala 
140 

Val Asn Asp Ser Val Val Met Ser 
155 160 

Pro Phe Ser Pro Asp Glu Asn Ser 
170 175 



Leu Val Leu Phe Glu Gly Asp Glu Val Tyr Ser Thr lie Arg Lys Gin 
180 185 190 

Glu Tyr Asn Gly Lys lie Pro Arg Phe Arg Arg lie Arg Gly Glu Ser 
* 195 200 205 

Glu Leu Tyr Thr Ser Asp Thr Val Met Gin Asn Pro Gin Phe lie Lys 
210 215 220 



Ala Thr lie Val His Gin Asp Gin Ala Tyr Asp Asp Lys lie Tyr Tyr 



225 



230 



235 



240 



Phe Phe Arg Glu Asp Asn Pro Asp 
245 

Val Ser Arg Val Ala Gin Leu Cys 
260 

Ser Leu Ser Val Ser Lys Trp Asn 
275 280 

Cys Ser Asp Ala Ala Thr Asn Arg 
290 295 



Lys Asn Pro Glu Ala Pro Leu Asn 
250 255 

Arg Gly Asp Gin Gly Gly Glu Ser 
265 270 

Thr Phe Leu Lys Ala Met Leu Val 
285 

Asn Phe Asn Arg Leu Gin Asp Val 
300 



Phe Leu Leu Pro Asp Pro, Ser Gly Gin Trp Arg Asp Thr Arg Val Tyr 
305 310 315 320 

Gly Val Phe Ser Asn -Pro Trp Asn. Tyr Ser . Ala Val lys Val Tyr Ser 
325 330 335 

Leu Gly Asp lie Asp Arg Val Phe Arg Thr Ser Ser Leu Lys Gly Tyr 
340 345 ■; 350 

His Met Gly Leu Ser Asn Pro Arg Pro Gly Met Cys Leu Pro Lys Lys 
355 360 365 

Gin Pro lie Pro Thr Glu Thr Phe Gin Val Ala Asp Ser iHis Pro Glu 
370 375 " 380 

Val Ala Gin Arg Val Glu Pro Met Gly Pro 
385 J 390 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
ACTCACTATA GGGCTCGAGC GGC 
(2) INFORMATION FOR SEQ ID NO : 6 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 6 
AGCCGCACAC GGTGCTTTTC 
. ( 2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
GCACAGATGC GTTCTTGCCC 
(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 
ACCATAGACC CTGGTGTCCC 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 



GCAGTGATGC TGCCACCAAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
CCAGACCATG TCGCTGGATG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
ACATGAGGCA ACCGTGGCAG 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
CCATCCTAAT ACGACTCACT ATAGGGC 
(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE. TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
AGGTAGACCT TGCCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GAACTTCAAC AGGCTGCAAG ACG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
ATGCTGAGCG GAGGAAGCTG 
(2) ..INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH :. 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CCGGCATACA CCTCACACAG 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
CTGGAAGCTT TCTGTGGGTA TCGGCTGC 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
TTTGGATCCC TGGTTCTGTT TGAAG 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 



TTCTAGAATT CAGCGGCCGC TTTTTTTTTT TTTTTTTTTT TTTTTTTTTT 



50 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GGGGAAAGTT CACTGTCAGT CTCCAAG 
(2) INFORMATION Fod SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
GGGAATACAC ACAGACGGCT GAGTAG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 22: 

AGCAAGTTCA GCCTGGTTAA GT - _ 2 2 - 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 



(B) TYPE: nucleic acid. 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
TTATGAGTAT TTCTTCCAGG G 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



|xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CCATTAATCC AGCCGAGCCA CACAAG 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
CATCTACAGC TCCGAACGGT CAGTG 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
CAGCGGAAGC CCCAACCGAG 
(2) INFORMATION FOR SEQ ID NO: 27: 

{ i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GGGATGACGC CTCCTCCGCC CGG 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
AAGCTTCACG TGGACCAGCA AGCCAAGAGT G 
(2) INFORMATION FOR SEQ ID NO: 29 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29 
AAGCTTTTTC CGTCCTTCCG TCCGG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
CO STRANDEDNESS: single 
(P) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
ATGGTGAGCA AGGGCGAGGA GCTG 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
CTTGTACAGC TCGTCCATGC CGAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
GGGTGGTGAG AGTTCGTTGT CTGTC 
(2) INFORMATION FOR SEQ ID NO: 33: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single ' 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO:33: 
GAGCGATGAG GTACGGAAGA CTCTG 25 
(2) INFORMATION FOR SEQ ID NO : 34: 

(i) SEQUENCE CHARACTERISTICS: ' 

(A) LENGTH: 5856 base pairs 

(B) TYPE : nucleic acid '" r ■ 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (geriomic) " • 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 60 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 120 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 180 

TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTTC 240 

ACGTGGACCA GCAAGCCAAG AGTGAGTGTG GGCAGCACCC CCAGCCAGAG GGAGGCAGCC 300 

AGGGCACAGG CATGACCCAG CAGGTGCTCG GCCATGATGC CGTCCTCGGG CAGCAGCTGC 360 

CAGTGCTGAG CCTCGCGGAA GTAGGAGCCC TCCTGGGCCT CGCAGAAGTA GTGGCCGTAC 420 

TGCTGCGCCG TGAGGTTCTC GATGAACAGG ATGCAGTTGG GGCTCTGGTG ACCAGGTTCG 480 

CAGCTCTGCT CCACGTTCTC CTTGTGGCGC CATGAGTAGG TGGCGTGGCG GGATTCCATG 540 

GGGCAGCTCA GGTAGTAGCG AGAGTTTGGG GCCAGGGAAA CCTTCTGCAG TGGGGCCTTG 600 

TCTGGTTTGG GGTTGGGACA CTCCTTGTGT GGCTCGGCTG GATTAATGGA TTGCAGCACT 660 

GACCGTTCGG AGCTGTAGAT GGAGATGCAG CGGCCCTGGT CCCAGCCGCA GTAGGGGTCT 720 

CGGGACATGA GGCAACCGTG GCAGCCCCCG CCATAGACCT CACACAGGTC CAGGGGCACC 780 

TGGCTCACCT CCCACTGGGA GCTCACATAC AGCTTCCTCC GCTCAGCATC CAGCGACATG 840 

GTCTGGATGG CAGCCGCGCG GCGGAAGGGC TGGATCTCCA TGATGTTGAA GGCGAAGCTG 900 



TCGGTCGCCG GGCGCGGTAT TCTCAGAATG ACTTGGTTGA GTACTCACCA GTCACAGAAA 4380 

AGCATCTTAC GGATGGCATG ACAGTAAGAG AATTATGCAG TGCTGCCATA ACCATGAGTG 4440 

ATAACACTGC GGCCAACTTA CTTCTGACAA CGATCGGAGG ACCGAAGGAG CTAACCGCTT 4500 

TTTTGCACAA CATGGGGGAT CATGTAACTC GCCTTGATCG TTGGGAACCG GAGCTGAATG 4560 

^ AAGCCATACC AAACGACGAG AGTGACACCA CGATGCCTGT AGCAATGCCA ACAACGTTGC 462 0 

GCAAACTATT AACTGGCGAA CTACTTACTC TAGCTTCCCG GCAACAATTA ATAGACTGGA 4680 

TGGAGGCGGA TAAAG*TTGCA GGACCACTTC TGCGCTCGGC CCTTCCGGCT GGCTGGTTTA 474 0 

TTGCTGATAA ATCTGGAGCC GGTGAGCGTG GGTCTCGCGG TATCATTGCA GCACTGGGGC 4800 

CAGATGGTAA GCCCTCCCGT ATCGTAGTTA TCTACACGAC GGGGAGTCAG GCAACTATGG .. 4860 

ATGAACGAAA TAGACAGATC GCTGAGATAG GTGCCTCACT GATTAAGCAT TGGTAACTGT 4920 

CAGACCAAGT TTACTCATAT ATACTTTAGA TTGATTTAAA ACTTCATTTT TAATTTAAAA 4980: 

GGATCTAGGT GAAGATCCTT TTTGATAATC TCATGACCAA AATCCCTTAA CGTGAGTTTT 5040 

CGTTCCACTG AGCGTCAGAC CCCGTAGAAA AGATCAAAGG ATCTTCTTGA GATCCTTTTT 5100 

TTCTGCGCGT AATCTGCTGC TTGCAAACAA AAAAACCACC GCTACCAGCG GTGGTTTGTT 5160 

TGCCGGATCA AGAGCTACCA ACTCTTTTTC GGAAGGTAAC TGGCTTCAGC AGAGCGCAGA 522 0 

TACCAAATAC TGTCCTTCT A GTGTAGCCGT AGTTAGGCCA CCACTTCAAG AACTCTGTAG 5280 

CACCGCCTAC ATACCTCGCT CTGCTAATCC TGTTACCAGT GGCTGCTGCC AGTGGCGATA 5340 

AGTCGTGTCT TACCGGGTTG GACTCAAGAC GATAGTTACC GGATAAGGCG CAGCGGTCGG 5400 

GCTGAACGGG GGGTTCGTGC ACACAGCCCA GCTTGGAGCG AACGACCTAC ACCGAACTGA 5460 

GATACCTACA GCGTGAGCAT TGAGAAAGCG CCACGCTTCC CGAAGGGAGA AAGGCGGACA 5520 

GGTATCCGGT AAGCGGCAGG GTCGGAACAG GAGAGCGCAC GAGGGAGCTT CCAGGGGGAA S580 

ACGCCTGGTA TCTTTATAGT CCTGTCGGGT TTGGCCACCT CTGACTTGAG CGTCGATTTT 5640 

TGTGATGCTC GTCAGGGGGG CGGAGCCTAT GGAAAAACGC CAGCAACGCG GCCTTTTTAC 5700 

GGTTCCTGGC CTTTTGCTGG CCTTTTGCTC ACATGTTCTT TCCTGCGTTA TCCCCTGATT 5760 

CTGTGGATAA CCGTATTACC GCCTTTGAGT GAGCTGATAC CGCTCGCCGC AGCCGAACGA 5820 
CCGAGCGCAG CGAGTCAGTG AGCGAGGAAG CGGAAG 5856 
(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7475 base -pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 : 

GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC TGCTCTGATG 60 

CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG 120 

CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC 180 

TTAGGGTTAG GCGTTTTGCG CTGCTTCGCG ATGTACGGGC CAGATATACG CGTTGACATT 240 
GATTATTGAC . TAGTTATTAA . TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA ; ^ 300 
TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC - 360 

CCCGCCCATT GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC 420 

ATTGACGTCA ATGGGTGGAC TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT . -480 

ATCATATGCC AAGTACGCCC CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT 540 

ATGCCCAGTA CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA 600 

TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA TAGCGGTTTG 660 

ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG TTTTGGCACC 720 

AAAATCAACG GGACTTTCCA AAATGTCGTA ACAACTCCGC CCCATTGACG CAAATGGGCG 780 

GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTGT CTGGCTAACT AGAGAACCCA 840 

CTGCTTACTG GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTGGCTAGC 900 

GTTTAAACGG GCCCTCTAGA CTCGAGCGGC CGCCACTGTG CTGGATATCT GCAGAATTCG 960 
GCTTGGGATG ACGCCTCCTC; CGCCCGGACG TGCCGCCCCC AGCGCACCGC GCGCCCGCGT " 1020 

CCCTGGCCCG CCGGCTCGGT TGGGGCTTCC GCTGCGGCTG CGGCTGCTGC TGCTGCTCTG 1080 

GGCGGCCGCC GCCTCCGCCC AGGGCCACCT AAGGAGCGGA CCCCGCATCT TCGCCGTCTG 1140 

GAAAGGCCAT GTAGGGCAGG AGCGGGTGGA CTTTGGCCAG ACTGAGCCGC ACACGGTGCT 1200 

TTTCCACGAG CCAGGCAGCT CCTCTGTGTG GGTGGGAGGA CGTGGCAAGG TCT AC CTCTT 1260 

TGACTTCCCC GAGGGCAAGA ACGCATCTGT GCGCACGGTG AATATCGGCT CCACAAAGGG 1320 

GTCCTGTCTG GATAAGCGGG ACTGCGAGAA CTACATCACT CTCCTGGAGA GGCGGAGTGA 13 80 

GGGGCTGCTG GCCTGTGGCA CCAACGCCCG GCACCCCAGC TGCTGGAACC TGGTGAATGG 1440 



TCCTTGACCC TGGAAGGTGC CACTCCCACT GTCCTTTCCT AATAAAATGA GGAAATTGCA 3180 

TCGCATTGTC TGAGTAGGTG TCATTCTATT CTGGGGGGTG GGGTGGGGCA GGACAGCAAG 3240 

GGGGAGGATT GGGAAGACAA TAGCAGGCAT GCTGGGGATG CGGTGGGCTC TATGGCTTCT 3300 

GAGGCGGAAA GAACCAGCTG GGGCTCTAGG GGGTATCCCC ACGCGCCCTG TAGCGGCGCA 3360 

TTAAGCGCGG CGGGTGTGGT GGTTACGCGC AGCGTGACCG CTACACTTGC CAGCGCCCTA 3420 

GCGCCCGCTC GTTTCGCTTT CTTCCCTTCC TTTCTCGCCA CGTTCGCCGG CTTTCCCCGT 3480 

CAAGCTCTAA ATCGGGGCAT CCCTTTAGGG TTCCGATTTA GTGCTTTACG GCACCTCGAC 3540 

CCCAAAAAAC TTGATTAGGG TGATGGTTCA CGTAGTGGGC CATCGCCCTG ATAGACGGTT 3600 

TTTCGCCCTT TGACGTTGGA GTCGACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA 3660 
ACAACACTCA ACCCTATCTC GGTCTATTCT TTTGATTTAT AAGGGATTTT GGGGATTTGG ■ 3720 

GCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA ACGCGAATTA ATTCTGTGGA. 3780 

ATGTGTGTCA GTTAGGGTGT GGAAAGTCCC CAGGCTCCCC AGGCAGGGAG AAGTATGCAA 3840 

AGCATGCATC TCAATTAGTC AGCAACCAGG TGTGGAAAGT CCCCAGGCTC CCCAGCAGGC 3900 

AGAAGTATGC AAAGCATGCA . TGTCAATTAG TCAGCAACCA TAGTCGCGCG CCTAACTCCG 3960 

CCCATCCCGC CCCTAACTCG GCOCAGTTCC GCCCATTCTC GGCCCCATGG CTGACTAATT 4020: 

TTTTTTATTT ATGCAGAGGC CGAGGCCGGC TCTGCCTCTG AGCTATTCCA GAAGTAGTGA 4080 

GGAGGCTTTT TTGGAGGCCT AGGCTTTTGC AAAAAGCTCC CGGGAGCTTG TATATGCATT 4140 

TTCGGATCTG ATCAAGAGAC AGGATGAGGA TCGTTTCGCA TGATTGT^ACA AGATGGATTG 4200 

CACGCAGGTT CTCCGGCCGC TTGGGTGGAG AGGCTATTCG GCTATGACTG GGCACAACAG -4260 

ACAATCGGCT GGTCTGATGC CGCCGTGTTC CGGCTGTCAG CGCAGGGGCG CCCGGTTCTT 4320 

TTTGTCAAGA CCGACCTGTC CGGTGCCCTG AATGAACTGC AGGACGAGGC AGCGCGGCTA 4380 

TCGTGGCTGG CGACGACGGG CGTTGCTTGC GGAGGTGTGC TCGACGTTGT CACTGAAGCG 4440 

GGAAGGGACT ; GGGTGCTATT GGGGGAAGTG CCGGGGCAGG ATCTCCTGTC ATCTCACCTT 4500 

GCTCCTGCCG, AGAAAGTATC CATCATGGCT GATGGAATGC GGCGGGTGGA TACGCTTGAT 4560 

CCGGCTACCT GCCCATTCGA CCAGGAAGCG AAACATCGCA TCGAGCGAGC AGGTACTCGG 4620 

ATGGAAGCCG GTCTTGTCGA TCAGGATGAT CTGGACGAAG AGCATCAGGG GCTCGCGCCA 4680 

GCCGAACTGT TCGC CAGGCT CAAGGCGCGC ATGCCCGACG GCGAGGATCT CGTCGTGACC 4740 

CATGGCGATG CCTGCTTGCC GAATATCATG GTGGAAAATG GCCGCTTTTC TGGATTCATC 4800 

GACTGTGGCC GGCTGGGTGT GGCGGACCGC TATCAGGACA TAGCGTTGGC TACCGGTGAT 4860 



ATTGCTGAAG AGCTTGGCGG CGAATGGGCT GACCGCTTCC TCGTGCTTTA CGGTATCGCC 4 92 0 

GCTCCCGATT CGCAGCGCAT CGCCTTCTAT CGCCTTCTTG ACGAGTTCTT CTGAGCGGGA 498 0 

CTCTGGGGTT CGAAATGACC GACCAAGCGA CGCCCAACCT GCCATCACGA GATTTCGATT 5040 

CCACCGCCGC CTTCTATGAA AGGTTGGGCT TCGGAATCGT TTTCCGGGAC GCCGGCTGGA 5100 

TGATCCTCCA GCGCGGGGAT CTCATGCTGG AGTTCTTCGC CCACCCCAAC TTGTTTATTG 516 0 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTT7T 5220 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGTA 5280 

TACCGTCGAC CTCTAGCTAG AGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA 5340 

ATTGTTATCC GCTCACAATT CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT 5400 

GGGGTGCCTA ATGAGTGAGC TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC 5460 

AGTCGGGAAA CCTGTCGTGC CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAG AGGCG 552 0 

GTTTGCGTAT TGGGCGCTCT TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC 5580 

GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG 564 0 

GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA 5700 

AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC 576 0 

GACGCTCAAG TCAGAGGTGG CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC 582 0 

CTGGAAGCTC CCTCGTGCGC TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG 5880 

CCTTTCTCCC TTCGGGAAGC GTGGCGCTTT CTCAATGCTC ACGCTGTAGG TATCTCAGTT 5940 

CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC 6000 

GCTGCGCCTT ATCCGGTAAC TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC 6060 

CACTGGCAGC AGCCACTGGT AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG 6120 

AGTTCTTGAA GTGGTGGCCT AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG 6180 

CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA 6240 

CCACCGCTGG TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG 6300 

GATCTCAAGA AGATCCTTTG ATCTTTT CT A CGGGGTCTGA CGCTCAGTGG AACGAAAACT 6360 

CACGTTAAGG GATTTTGGTC ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA 6420 

ATTAAAAATG AAGTTTTAAA TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT 6480 

ACCAATGCTT AATCAGTGAG GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG 6 54 0 



TTGCCTGACT CCCCGTCGTG TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA 
GTGCTGCAAT GATACCGCGA GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC 
AGCCAGCCGG AAGGGCCGAG CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT 
CTATTAATTG TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG 
TTGTTGCCAT TGCTACAGGC ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA 
GCTCCGGTTC CCAACGATCA AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG 
TTAGCTCCTT CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA 
TGGTTATGGC AGCACTGCAT AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG 
TGACTGGTGA GTACTCAACC AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT 
CTTGCCCGGC GTCAATACGG GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA 
TCATTGGAAA AGGTTCTTCG GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTG AGATCCA 
GTTCGATGTA ACCCACTCGT GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG 
TTTCTGGGTG AGCAAAAACA GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC 
GGAAATGTTG AATACTCATA CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT 
ATTGTCTCAT GAGCGGATAC ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC 
CGCGCACATT TCCCCGAAAA GTGCCACCTG ACGTC 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8192 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION:, SEQ ID NO : 36 : 
GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC TGCTCTGATG 
CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG 
CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC 
TTAGGGTTAG GCGTTTTGCG CTGCTTCGCG ATGTACGGGC CAGATATACG CGTTGACATT 
GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA 



TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGAGCG CCCAACGACC 360 

CCCGCCCATT GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC ' 420 

ATTGACGTCA ATGGGTGGAC TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT 480 

ATGATATGCC AAGTACGCCC CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT 540 

ATGCCCAGTA CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA 600 

TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA TAGCGGTTTG 660 

ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG TTTTGGCACC 720 

AAAATCAACG GGACTTTCCA AAATGTCGTA ACAACTCCGC CCCATTGACG CAAATGGGCG 780 

GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA 840 

CTGCTTACTG GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTGGCTAGC 900 

GTTTAAACGG GCCCTCTAGA CTCGAGCGGC CGCCACTGTG GTGGATATCT GCAGAATTCG 960 

GCTTGGGATG ACGCCTCCTC CGCCCGGACG TGCCGCCCCC AGCGCACCGC GCGCCCGCGT 1020 

CCCTGGCCCG CCGGCTCGGT TGGGGCTTCC GCTGCGGCTG CGGCTGCTGC TGCTGCTCTG 1080 

GGCGGCCGCC GCCTCCGCCC AGGGCCACCT AAGGAGCGGA CCCCGCATCT TCGCCGTCTG 1140 

GAAAGGCCAT GTAGGGCAGG ACCGGGTGGA CTTTGGCCAG ACTGAGCCGC ACACGGTGCT 1200 

TTTCCACGAG CGAGGCAGCT CCTCTGTGTG GGTGGGAGGA CGTGGCAAGG TGTACCTCTT 126.0 

TGACTTCCCC GAGGGCAAGA ACGCATCTGT GCGCACGGTG AATATCGGCT CCACAAAGGG 1320 

GTCCTGTCTG GATAAGCGGG ACTGCGAGAA CTACATCACT CTCCTGGAGA GGCGGAGTGA 13 80 

GGGGCTGCTG GCCTGTGGCA CCAACGCCCG GCACCCCAGC TGCTGGAACC TGGTGAATGG 1440 

CACTGTGGTG CCACTTGGCG AGATGAGAGG CTACGCCCCC TTCAGCCCGG ACGAGAACTC 1500. 

CCTGGTTCTG TTTGAAGGGG ACGAGGTGTA TTCCACCATC CGGAAGCAGG AATACAATGG 1560 

GAAGATCCCT CGGTTCCGCC GCATCCGGGG CGAGAGTGAG CTGTACACCA GTGATACTGT 1620 

CATGCAGAAC CCACAGTTCA TCAAAGCCAC CATCGTGCAC CAAGACCAGG CTTACGATGA 1680 

CAAGATCTAC TACTTCTTCC GAGAGGACAA TCCTGACAAG AATCCTGAGG CTCCTCTCAA 1740 

TGTGTCCCGT GTGGCCCAGT TGTGCAGGGG GGACCAGGGT GGGGAAAGTT CACTGTCAGT 1800 

CTCCAAGTGG AACACTTTTC TGAAAGCCAT GCTGGTATGC AGTGATGCTG CCACCAACAA 1860 

GAACTTCAAC AGGCTGCAAG ACGTCTTCCT GCTCCCTGAC CCCAGCGGCC AGTGGAGGGA 1920 

CACCAGGGTC TATGGTGTTT TCTCCAACCC CTGGAACTAC TCAGCCGTCT GTGTGTATTC 1980 

CCTCGGTGAC ATTGACAAGG TCTTCCGTAC CTCCTCACTC AAGGGCTACC ACTCAAGCCT 2040 




TCCCAACCCG CGGCCTGGCA AGTGCCTCCC AGACCAGCAG CCGATACCCA CAGAGACCTT 2100 

CCAGGTGGCT GACCGTCACC CAGAGGTGGC GCAGAGGGTG GAGCCCATGG GGCCTCTGAA 2160 

GACGCCATTG TTCCACTCTA AATACCACTA CCAGAAAGTG GCCGTTCACC GCATGCAAGC 2220 

CAGCCACGGG GAGACCTTTC ATGTGCTTTA CCTAACTACA GACAGGGGCA CTATCCACAA 2280 

GGTGGTGGAA CCGGGGGAGC AGGAGCACAG CTTCGCCTTC AACATCATGG AGATCCAGCC 2340 

CTTCCGCCGC GCGGCTGCCA TCCAGACCAT GTCGCTGGAT GCTGAGCGGA GGAAGCTGTA 2400 

TGTGAGCTCC CAGTGGGAGG TGAGCCAGGT GCCCCTGGAC CTGTGTGAGG TCTATGGCGG 2460 

GGGCTGCCAC GGTTGCCTCA TGTCCCGAGA GCCCTACTGC GGCTGGGACC AGGGCCGCTG 2520 

CATCTCCATC TACAGCTCCG AACGGTCAGT GCTGCAATCC ATTAATCCAG CCGAGCCACA 2580 

CAAGGAGTGT CCCAACCCCA AACCAGACAA GGGCCCACTG CAGAAGGTTT CCCTGGCCCC 2640 

AAACTCTCGC TACTACCTGA GCTGCCCCAT GGAATCCCGC CACGCCACCT ACTCATGGCG 2700 

CCACAAGGAG AACGTGGAGC AGAGCTGCGA ACCTGGTCAC CAGAGCCCCA ACTGCATCCT 2760 

GTTCATCGAG AACCTCACGG CGCAG CAGTA CGGCCACTAC TTCTGCGAGG CCCAGGAGGG 2820 

CTCCTACTTC GGCGAGGCTC AGCACTGGCA GCTGCTGCCC GAGGACGGCA TCATGGCCGA 2880 

GCACCTGCTG GGTCATGCCT GTGCCCTGGC TGCCTCCCTC TGGCTGGGGG TGCTGCCCAC 2940 

ACTCACTCTT GGCTTGCTGG TCCACATGGT GAGCAAGGGC GAGGAGCTGT TCACCGGGGT 3000 

GGTGCCCATC CTGGTCGAGC TGGACGGCGA CGTAAACGGC CACAAGTTCA GCGTGTCCGG 3 060 

CGAGGGCGAG GGCGATGCCA CCTACGGCAA GCTGACCCTG AAGTTCATCT GCACCACCGG 3120 

CAAGCTGCCC GTGCCCTGGC CCACCCTCGT GACCACCCTG ACCTACGGCG TGCAGTGCTT 3180 

CAGCCGCTAC CCCGACCACA TGAAGCAGCA CGACTTCTTC AAGTCCGCCA TGCCCGAAGG 3240 

CTACGTCCAG GAGCGCACCA TCTTCTTCAA GGACGACGGC AACTACAAGA CCCGCGCCGA 3300 
GGTGAAGTTC GAGGGCGACA CCCTGGTGAA CCGCATCGAG CTGAAGGGCA TCGACTTCAA . 3360 

GGAGGACGGC AACATCCTGG GGCACAAGCT GGAGTACAAC TACAACAGCC ACAACGTCTA 3420 

TATCATGGCC GACAAGCAGA AGAACGGCAT CAAGGTGAAC TTCAAGATCC GCCACAACAT 3480 

CGAGGACGGC AGCGTGCAGC TCGCCGACCA CTACCAGCAG AACACCCCCA TCGGCGACGG 3540 

CCCCGTGCTG CTGCCCGACA ACCACTACCT GAGCACCCAG TCCGCCCTGA GCAAAGACCC 3600 

CAACGAGAAG CGCGATCACA TGGTCCTGCT GGAGTTCGTG ACCGCCGCCG GGATCACTCT 3660 

CGGCATGGAC GAGCTGTACA AGGTGAAGCT TGGGCCCGAA CAAAAACTCA TCTCAGAAGA 3720 



GGATCTGAAT AGCGCCGTCG ACCATCATCA TCATCATCAT TGAGTTTAAA CCGCTGATCA 3780 

GCCTCGACTG TGCCTTCTAG TTGCCAGCCA TCTGTTGTTT GCCCCTCCCC CGTGCCTTCC 3840 

TTGACCCTGG AAGGTGCCAC TCCCACTGTC CTTTCCTAAT AAAATGAGGA AATTGCATCG 3900 

CATTGTCTGA GTAGGTGTCA TTCTATTCTG GGGGGTGGGG TGGGGCAGGA CAGCAAGGGG 3960 

GAGGATTGGG AAGACAATAG CAGGCATGCT GGGGATGCGG TGGGCTCTAT GGCTTCTGAG 4020 

GCGGAAAGAA CCAGCTGGGG CTCTAGGGGG TATCCCCACG CGCCCTGTAG CGGCGCATTA 4080 

AGCGCGGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG 4140 

CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA 4200 

GCTCTAAATC GGGGCATCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC 4260 

AAAAAACTTG ATTAGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT 4320 

CGCCCTTTGA CGTTGGAGTC CACGTTCTTT AATAGTGGAC ;TCTTGTTCCA AACTGGAACA 4380 

ACACTCAACC CTATCTCGGT CTATTCTTTT GATTTATAAG GGATTTTGGG GATTTCGGCC 4440 

TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG CGAATTAATT CTGTGGAATG 4 500 

TGTGTCAGTT AGGGTGTGGA AAGTCCCCAG GCTCCCCAGG CAGGCAGAAG TATGCAAAGC 4560 

ATGCATCTCA ATTAGTCAGC AACCAGGTGT GGAAAGTCCC CAGGCTCCCC AGCAGGCAGA 4620 

AGTATGCAAA GCATGCATCT CAATTAGTCA GCAACCATAG TCCCGCCCCT AACTCCGCCC 4 680 

ATCCCGCCCC TAACTCCGCC CAGTTCCGCC CATTCTCCGC CCCATGGCTG ACTAATTTTT 4740 

TTTATTTATG CAGAGGCCGA GGCCGCCTCT GCCTCTGAGC TATTCCAGAA GTAGTGAGGA 4800 

GGCTTTTTTG GAGGCCTAGG CTTTTGCAAA AAGCTCCCGG GAGCTTGTAT ATCCATTTTC 4860 

GGATCTGATC AAGAGACAGG ATGAGGATCG TTTCGCATGA TTGAACAAGA TGGATTGCAC 4920 

GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG CTATTCGGCT ATGACTGGGC ACAACAGACA 4980 

ATCGGCTGCT CTGATGCCGC CGTGTTCCGG CTGTCAGCGC AGGGGCGCCC GGTTGTTTTT 5040 

GTCAAGACCG ACCTGTCCGG TGCCCTGAAT GAACTGCAGG ACGAGGCAGC GCGGCTATCG 5100 

TGGCTGGCCA CGACGGGCGT TCCTTGCGCA GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA 5160 

AGGGACTGGC TGCTATTGGG CGAAGTGCCG GGGCAGGATC TCCTGTCATC TCACCTTGCT 5220 

CCTGCCGAGA AAGTATCCAT CATGGCTGAT GCAATGCGGC GGCTGCATAC GCTTGATCCG 5280 

GCTACCTGCC CATTCGACCA CCAAGCGAAA CATCGCATCG AGCGAGCACG TACTCGGATG 534 0 

GAAGCCGGTC TTGTCGATCA GGATGATCTG GACGAAGAGC ATCAGGGGCT CGCGCCAGCC 5400 

GAACTGTTCG CCAGGCTCAA GGCGCGCATG CCCGACGGCG AGGATCTCGT CGTGACCCAT 5460 



GGCGATGCCT GCTTGCCGAA TATCATGGTG GAAAATGGCC GCTTTTCTGG ATTCATCGAC 5520 

TGTGGCCGGC TGGGTGTGGC GGACCGCTAT CAGGACATAG CGTTGGCTAC CCGTGATATT 5580 

GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC CGCTTCCTCG TGCTTTACGG TATCGCCGCT 5640 

CCCGATTCGC AGCGCATCGC CTTCTATCGC CTTCTTGACG AGTTCTTCTG AGCGGGACTC 5700 

TGGGGTTCGA AATGACCGAC CAAGCGACGC CCAACCTGCC ATCACGAGAT TTCGATTCCA 5760 

CGGCCGCCTT CTATGAAAGG TTGGGCTTCG GAATCGTTTT CCGGGACGCC GGCTGGATGA 5820 

TCCTCCAGCG CGGGGATCTC ATGCTGGAGT TCTTCGCCCA CCCCAACTTG TTTATTGCAG 5880 

CTTATAATGG TTACAAATAA AGGAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 5940 

CACTGC ATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGTATAC 6000 

CGTCGACCTC TAGCTAGAGC TTGGCGTAAT CATGGTCATA GCTGTTTCCT GTGTGAAATT 6060 

GTTATCCGCT CACAATTCCA CACAACATAC GAGCCGGAAG CATAAAGTGT AAAGCCTGGG 6120 

GTGCCTAATG AGTGAGCTAA CTCACATTAA TTGCGTTGCG CTCACTGCCC GCTTTCCAGT 6180 

CGGGAAACCT GTCGTGCCAG CTGCATTAAT GAATCGGCCA ACGCGCGGGG AGAGGCGGTT 6240 

TGCGTATTGG GCGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG GTCGTTCGGC 6300 

TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG GTTATCCACA GAATCAGGGG 6360 

ATAACGCAGG AAAGAACATG TGAGCAAAAG GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG 6420 

CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA CGAGCATCAC AAAAATCGAC 64 80 

GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG 6540 

GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT 6600 

TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC AATGCTCACG CTGTAGGTAT CTCAGTTCGG 6660 

TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC CCCCGTTCAG CCCGACCGCT 6720 

GCGCCTTATC CGGTAACTAT CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC . 6780 

TGGCAGCAGC CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT 6840 

TCTTGAAGTG GTGGCCTAAC TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC 6900 

TGCTGAAGCC AGTTACCTTC GG AAAAAG AG TTGGTAGCTC TTGATCCGGC AAACAAACCA 6960 

CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT 7020 

CTCAAGAAGA TCCTTTGATC TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC 7080 

GTTAAGGGAT TTTGGTCATG AGATTATCAA AAAGGATCTT CACCTAGATG CTTTTAAATT 7140 



1. 



AAAAATGAAG TTTTAAATGA; ATCTAAAGTA TATATGAGTA AACTTGGTCT GACAGTTACC 7200 

AATGCTTAAT CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG 7260 

CCTGACTCCC CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTAGCATCT GGCCCGAGTG 7320 

CTGCAATGAT ACCGCGAGAC CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC 7380 

CAGCCGGAAG GGCCGAGCGC AGAAGTGGTC CTGCAACTTT ATCCGCCTCC , ATCCAGTCTA 7440 

TTAATTGTTG CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG ; 75 00 
TTGCCATTGC TACAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT / 7560 

CCGGTTCCCA ACGATCAAGG GGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGGGGTTA 7620 

GCTCCTTCGG TCCTCCGATG GTTGTC AG AA GTAAGTTGGC CGC AGTGTTA TCAGTCATGG 7680 

TTATGGCAGC ACTGCATAAT TCTCTTACTG TCAXGCCATG CGTAAGATGC TTTTCTGTGA 7740 

' CTGGTGAGTA GTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCGACCG AGTTGCTCTT 7800 

GCCCGGCGTC AATACGGGAT » AATACCGCGC GAGATAGCAG AAGTTTAAAA GTGCTGATCA 7860 

TTGGAAAACG TTCTTCGGGG CGAAAACTGT CAAGGATCTT AGCGCTGTTG AGATCCAGTT 7920 

CGATGTAACC CACTCGTGGA GCCAACTGAT CTTCAGCATC TTTTACTTTC ACGAGCGTTT ; 7980 

CTGGGTGAGC AAAAACAGGA AGGCAAAATG CCGCAAT^AAA GGGAATAAGG GCGACACGGA 8040 

AATGTTGAAT ACTCATACTC TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT 8100 

GTCTCATGAG CGGATACATA TTTGAATGTA TTTAGAAAAA TAAACAAATA GGGGTTCCGC 8160 

GCACATTTCC CCGAAAAGTG CCACCTGACG TC 8192 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7000 base pairs 

(B) TYPE: nucleic acid r 

(C) STRANDEDNESS : single 

(D) TOPOLOGY.- linear ; 

(ii) MOLECULE TYPE: DNA (genomic) tJ 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

— AGATCTCGGC CGCATATTAA GTGCATTGTT CTCGATACCG CTAAGTGCAT TGTTCTCGTT 60 

AGCTCGATGG AC AAGTG CAT TGTTCTCTTG CTGAAAGCTC GATGGACAAG TGCATTGTTC 120 

TCTTGCTGAA AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC AGTACCCGGG 180 



AGTACCCTCG ACCGCCGGAG TATAAATAGA GGCGCTTCGT CTACGGAGCG ACAATTCAAT 240 

TCAAACAAGC AAAGTGAACA GGTCGCTAAG CGAAAGCTAA GCAAATAAAC AAGCGCAGCT 300 

GAACAAGCTA AACAATCTGC AGTAAAGTGC AAGTTAAAGT GAATCAATTA AAAGTAACCA 360 

GCAACCAAGT AAATCAACTG CAACTACTGA AATCTGCCAA GAAGTAATTA TTGAATACAA 420 

GAAGAGAACT CTGAATACTT TCAACAAGTT ACCGAGAAAG AAGAACTCAC ACACAGCTAG 480 

CGTTTAAACT TAAGCTTGGT ACCGAGCTCG. GATCCACTAG TCCAGTGTGG TGGAATTGGG 540 

CTTGGGATGA CGCCTCCTCC GCCCGGACGT GCCGCGCCCA GCGCACCGCG CGCCCGCGTC 600 

CCTGGCCCGC CGGCTCGGTT GGGGCTTCCG CTGCGGCTGC GGCTGCTGCT GCTGCTCTGG 660 

GCGGCCGCCG CCTCCGCGCA GGGCCACGTA AGGAGCGGAC' : CGCGCATGTT CGCGGTCTGG 720 

AAAGGCCATG TAGGGCAGGA GCGGGTGGAC TTTGGCCAGA CTGAGCCGCA GACGGTGCTT 780 

TTCCACGAGC CAGGCAGCTC CTCTGTGTGG GTGGGAGGAC GTGGCAAGGT CTACCTCTTT 840 

GACTTCCCCG AGGGCAAGAA CGCATCTGTG CGCACGGTGA ATATCGGCTC CACAAAGGGG 900 

TCCTGTCTGG ATAAGCGGGA CTGCGAGAAG' TACATCACTC TCCTGGAGAG GCGGAGTGAG 960 

GGGCTGCTGG CCTGTGGCAC CAACGCCCGG -CAGCCCAGCT GCTGGAACCT GGTGAATGGC 1020 

ACTGTGGTGC CAGTTGGCGA GATGAGAGGC TACGCCCCCT TGAGCCCGGA CGAGAACTCC 1080 

CTGGTTCTGT TTGAAGGGGA CGAGGTGTAT TGCACCATCC GGAAGCAGGA ATACAATGGG 1140 

AAGATCCCTC GGTTCCGCCG CATCCGGGGC GAGAGTGAGC TGTACACCAG TGATACTGTC 1200 

ATGCAGAACC CACAGTTCAT CAAAGCCACC ATCGTGCACC AAGACCAGGC TTACGATGAC 1260 

AAGATCTACT ACTTCTTCCG AGAGGACAAT CCTGACAAGA ATCCTGAGGC TCCTCTCAAT 1320 

GTGTCCCGTG TGGCCCAGTT GTGCAGGGGG GACCAGGGTG GGGAAAGTTC ACTGTCAGTC 1380 

TCCAAGTGGA ACACTTTTCT GAAAGCCATG CTGGTATGCA GTGATGCTGG CACCAACAAG 1440 

AACTTCAACA GGCTGCAAGA CGTCTTCCTG CTCCCTGACC CCAGCGGCCA GTGGAGGGAG 1500 

ACGAGGGTCT ATGGTGTTTT CTCC7UVCCCC TGGAACTACT CAGCCGTCTG TGTGTATTCC 1560 

CTCGGTGACA TTGACAAGGT CTTCCGTACC TCCTCACTCA AGGGCTACCA CTCAAGCCTT 1620 

CCCAACCCGG GGCCTGGCAA GTGCCTCCCA GACCAGCAGC CGATACCCAC AGAGACCTTC 1680 

CAGGTGGCTG ACCGTCACCC AGAGGTGGCG CAGAGGGTGG AGCC CATGGG GCCTCTGAAG 1740 

ACGCCATTGT TCCACTCTAA ATACCACTAC CAGAAAGTGG CCGTTCACCG CATGCAAGCC 1800 

AGCCACGGGG AGACCTTTCA TGTGCTTTAC CTAACTACAG ACAGGGGCAC TATCCACAAG 1860 

GTGGTGGAAC CGGGGGAGCA GGAG CACAGC TTCGCCTTCA ACATCATGGA GATCCAGCCC 1920 



TTCCGCCGCG CGGCTGCCAT CCAGACCATG TCGCTGGATG CTGAGCGGAG GAAGCTGTAT 1980 

GTGAGCTCCC AGTGGGAGGT GAGCCAGGTG CCCCTGGACC TGTGTGAGGT CTATGGCGGG 2040 

GGCTGCCACG GTTGCCTCAT GTCCCGAGAC CCCTACTGCG GCTGGGAGCA GGGCCGCTGC 2100 

ATCTCCATCT ACAGCTCCGA ACGGTCAGTG CTGCAATCCA TTAATCCAGC CGAGCCACAC 2160 

AAGGAGTGTC CCAACCCCAA ACCAGACAAG GGCCCACTGC AGAAGGTTTC CCTGGCCCCA 2220 

AACTCTCGCT ACTACCTGAG CTGCCCCATG GAATCCCGCC ACGCCACCTA CTCATGGCGC 2280 

CACAAGGAGA ACGTGGAGCA GAGCTGCGAA CCTGGTCACC AGAGCCCCAA CTGCATCCTG 234 0 

TTCATCGAGA ACCTCACGGC GCAGCAGTAC GGCCACTACT TCTGCGAGGC CCAGGAGGGC 2400 

TCCTACTTCC GCGAGGCTCA GCACTGGCAG CTGCTGCCCG AGGACGGGAT CATGGCCGAG 2460 

CACCTGCTGG GTCATGCCTG TGCGCTGGCT GCCTCCCTCT GGCTGGGGGT GCTGCCCACA 2520 

CTCACTCTTG GCTTGCTGGT CCACGTGAAG CTTGGGCCCG TTTAAACCCG CTGATCAGCC 2580 

TCGACTGTGC CTTCTAGTTG CCAGCCATCT GTTGTTTGCC CCTCCCCCGT GCCTTCCTTG 2640 

ACCCTGGAAG GTGCCACTCC CACTGTCCTT TCCTAATAAA ATGAGGAAAT TGCATCGCAT 2700 

TGTCTGAGTA GGTGTCATTC TATTCTGGGG GGTGGGGTGG GGCAGGACAG CAAGGGGGAG 2760 

GATTGGGAAG ACAATAGCAG GCATGCTGGG GATGCGGTGG GCTCTATGGC TTCTGAGGCG 2820 

GAAAGAACCA GCTGGGGCTC TAGGGGGTAT CCCCACGCGC CCTGTAGCGG CGCATTAAGC 2880 

GCGGCGGGTG TGGTGGTTAC GCGCAGCGTG ACCGCTACAC TTGCCAGCGC CCTAGCGCCC 2940 

GCTCCTTTCG CTTTCTTCCC TTCCTTTCTC GGCACGTTCG CCGGCTTTCC CCGTCAAGCT 3000 

CTAAATCGGG GCATCCCTTT AGGGTTCCGA TTTAGTGCTT TACGGCACCT CGACCCCAAA 3060 

AAACTTGATT AGGGTGATGG TTCACGTAGT GGGCCATCGC CCTGATAGAC GGTTTTTCGC 3120 

CCTTTGACGT TGGAGTCCAC GTTCTTTAAT AGTGGACTCT TGTTCCAAAC TGGAACAACA 3180 

. CTCAACCCTA TCTCGGTCTA TTCTTTTGAT TTATAAGGGA TTTTGGGGAT TTCGGCCTAT 3240 

TGGTTAAAAA ATGAGCTGAT TTAACAAAAA TTTAACGCGA ATTAATTCTG TGGAATGTGT 3300 

GTCAGTTAGG GTGTGGAAAG TCCCCAGGCT CCCCAGGCAG GCAGAAGTAT GCAAAGCATG 3360 

CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG GCTCCCCAGC AGGCAGAAGT 3420 

ATGCAAAGCA TGCATCTCAA TTAGTCAGCA ACCATAGTCC CGCCCCTAAC TCCGCCCATC" 3480 

CCGCCCCTAA CTCCGCCCAG TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT 3 540 

ATTTATGCAG AGGCCGAGGC CGCCTCTGCC TCTGAGCTAT TCCAGAAGTA GTGAGGAGGC 3600 



TTTTTTGGAG GCCTAGGCTT TTGCAAAAAG CTCCCGGGAG CTTGTATATC CATTTTCGGA 3660 

TCTGATCAAG AGACAGGATG AGGATCGTTT CGCATGATTG AACAAGATGG ATTGCACGCA 3720 

GGTTCTCCGG CCGCTTGGGT GGAGAGGCTA TTCGGCTATG ACTGGGCACA ACAGACAATC 3780 

GGCTGCTCTG ATGCCGC CGT GTTCCGGCTG TCAGCGCAGG GGCGCCCGGT TCTTTTTGTC 3840 

AAGACCGACC TGTCCGGTGC CCTGAATGAA CTGCAGGACG AGGCAGCGCG GCTATCGTGG 3900 

CTGGCCACGA CGGGCGTTCC TTGCGCAGCT GTGCTCGACG TTGTCACTGA AGCGGGAAGG 3 960 

GACTGGCTGC TATTGGGCGA AGTGCCGGGG CAGGATCTCC TGTCATCTCA CCTTGCTCCT 4 020 

GCCGAGAAAG TATCCATCAT GGCTGATGCA ATGCGGCGGC TGCATACGCT TGATCCGGCT 4 080 

ACCTGCCCAT TCGACCACCA AGCGAAACAT CGCATCGAGC GAGCACGTAC TCGGATGGAA 4140 

GCCGGTCTTG TCGATCAGGA TGATCTGGAC GAAGAGCATC AGGGGCTCGC GCCAGCCGAA 4200 

CTGTTCGCCA GGCTCAAGGC GCGCATGCCC GACGGCGAGG ATCTCGTCGT GACCCATGGC 4260 

GATGCCTGCT TGCCGAATAT GATGGTGGAA AATGGCCGCT TTTCTGGATT CATCGACTGT 4 320 

GGCCGGCTGG GTGTGGCGGA CCGCTATCAG GACATAGCGT TGGCTACCCG TGATATTGCT 4380 

GAAGAGCTTG GCGGCGAATG GGCTGACCGC TTCCTCGTGC TTTACGGTAT CGCCGCTCCC 4440 

GATTCGCAGC G CATCGCCTT CTATCGCCTT CTTGACGAGT TCTTCTGAGC GGGACTCTGG 4 500 

GGTTCGAAAT GACGGACCAA GCGACGCCCA ACCTGCCATC ACGAGATTTC GATTCCACCG 4 560 

CCGCCTTCTA TGAAAGGTTG GGCTTCGGAA TCGTTTTCCG GGACGCCGGC TGGATGATCC 4620 

TCCAGCGCGG GGATCTCATG CTGGAGTTCT TCGCCCACCC CAACTTGTTT ATTGCAGCTT 4680 

ATAATGGTTA CAAATAAAGC AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC 4 740 

TGCATTCTAG TTGTGGTTTG TCCAAACTCA TCAATGTATC TTATCATGTC TGTATACCGT 4 800 

CGACCTCTAG CTAGAGCTTG GCGTAATCAT GGTCATAGCT GTTTCCTGTG TGAAATTGTT 4860 

ATCCGCTCAC AATTCCACAC AACATACGAG CCGGAAGCAT AAAGTGTAAA GCCTGGGGTG 4 920 

CCTAATGAGT GAGCTAACTC ACATTAATTG CGTTGCGCTC ACTGCCCGCT TTCCAGTCGG 4980 

GAAACCTGTC GTGCCAGCTG CATTAATGAA TCGGCCAACG CGCGGGGAGA GGCGGTTTGC 504 0 

GTATTGGGCG CTCTTCCGCT TCCTCGCTCA CTGACTCGCT GCGCTCGGTC GTTCGGCTGC 5100 

GGCGAGCGGT ATCAGCTCAC TCAAAGGCGG TAATACGGTT ATCCACAGAA TCAGGGGATA 5160 

ACGCAGGAAA GAACATGTGA GCAAAAGGCC AGCAAAAGGC CAGGAACCGT AAAAAGG CCG 5220 

CGTTGCTGGC GTTTTTCCAT AGGCTCCGCC CCCCTGACGA GCATCACAAA AATCGACGCT 5280 

CAAGTCAGAG GTGGCGAAAC CCGACAGGAC TATAAAGATA CCAGGCGTTT CCCCCTGGAA 5340 



'C 

GCTCCCTCGT GCGCTCTCCT GTTCCGACCC TGCCGCTTAC CGGATACCTG TCCGCCTTTC 5400 

TCCCTTCGGG AAGCGTGGCG CTTTCTCAAT GCTCACGCTG TAGGTATCTC AGTTCGGTGT 5460 

AGGTCGTTCG CTCCAAGCTG GGCTGTGTGC ACGAACCCCC CGTTCAGCCC GACCGCTGCG 5520 

CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGTAAG ACACGACTTA TCGCCACTGG 5580 

CAGCAGCCAC TGGTAACAGG ATTAGCAGAG CGAGGTATGT AGGCGGTGCT ACAGAGTTCT 5640 

TGAAGTGGTG GCCTAACTAG GGCTACACTA GAAGGACAGT ATTTGGTATC TGCGCTCTGC 5700 

TGAAGCCAGT TACCTTCGGA AAAAGAGTTG GTAGCTCTTG ATGCGGGAAA CAAAC (2 ACCG 5760 

CTGGTAGCGG ■ TGGTTTTTTT GTTTGCAAGC AGGAGATTAC GCGC&GAAAA AAAGGATCTC • 5820 

AAGAAGATCC TTTGATCTTT TCTAGGGGGT CTGACGCTCA GTGGAACGAA AACTC ACGTT 5880 

AAGGG ATTTT GGTCATGAGA TTATCAAAAA GGATCTTCAC CTAGATCCTT TTAAATTAAA 5 940 

AATGAAGTTT TAAATCAATC TAAAGTATAT ATGAGTAAAC TTGGTCTGAC AGTTAGCAAT 6000 

GCTTAATCAG v TGAGGCACCT ATCTCAGCGA TCTGTCTATT ' TCGTTGATCC ATAGTTGCGT 6060 

GACTCCCCGT CGTGTAGATA ACTAGGATAC GGGAGGGCTT ACCATCTGGC CCGAGTGCTG 6120 

CAATGATACC GGGAGACCCA GGCTCACCGG CTCCAGATTT ATCAGCAATA AACGAGCCAG 6180 

CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC CGCGTGCATC CAGTCTATTA 624 0 

ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA TAGfTTTGCGC AACGTTGTTG 6300 

CCATTGCTAC AGGCATCGTG GTGTCACGCT CGTGGTTTGG TATGGCTTCA TTCAGCTCCG 6360 

GTTCCCAAGG ATCAAGGGGA GTTACATGAT CCCCCATGTT - GTGCAAAAAlA GCGGTTAGCT 6420 

CCTTCGGTCC TCCGATCGTt GTGAGAAGTA AGTTGGCGGC AGTGTTATCA CTCATGGTTA : 6480 

TGGCAGCACT GCATAATTCT CTTACTGTCA TGCCATCCGT AAGATGCTTT TCTGTGACTG 6540 
GTGAGTACTC AACCAAGTCA TTGTGAG^T AGTGTAf GGG ^ TGCTCTTGCG ■ : 6600 v 
CGGCGTCAAT ACGGGATAAT : AGGGGGCCAG ATAGGAGAAC TTTAAAAGTG CTCATCATTG . 6660 
GAAAACGTTC TTCGGGGCGA- AAACTCTGAA GGATCTTACC GCTGTTGAGA TCCAGTTCGA ' 6720 

TGTAACCCAC TCGTGCACCC AACTGATCTT CAGCATCTTT TACTTTGACC AGCGTTTCTG 6780 

GGTGAGCAAA AACAGGAAGG CAAAATGCCG CAAAAAAGGG AATAAGGGCG ACACGGAAAT 6840 

GTTGAATACT GATACTCTTC CTTTTTCAAT ATTATTGAAG CATTTATCAG GGTTATTGTC 6900 

TCATGAGCGG ATACATATTT GAATGTATTT AGAAAAATAA ACAAATAGGG GTTCCGCGCA 6960 

CATTTCCCCG AAAAGTGCCA CCTGACGTCG ACGGATCGGG 7000 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7108 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

AGATCTCGGC CGCATATTAA GTGCATTGTT CTCGATACCG CTAAGTGCAT TGTTCTCGTT 60 
AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC GATGGACAAG TGCATTGTTC ; , ; 120 

TCTTGCTGAA AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC AGTACCCGGG 180 
AGTACCCTCG ACCGCCGGAG TATAAATAGA GGCGCTTCGT CTACGGAGCG ACAATTCAAT . 240 

TCAAACAAGC AAAGTG AACA CGTCGCTAAG CGAAAGCTAA GCAAATAAAC AAGCGCAGCT 3 00: 

GAACAAGCTA AACAATCTGC AGTAAAGTGC : AAGTTAAAGT GAATCAATTA AAAGTAACCA 360 

GCAACCAAGT AAATCAACTG CAACTACTGA AATCTGCCAA GAAGTAATTA TTGAATACAA ^20 

GAAGAGAACT CTGAATACTX TCAACAAGTT ACCGAGAAAG AAGAACTCAC. ACACAGCTAG 480 

CGTTTAAACT TAAGCTTGGT ACCGAGCTCG GATCCACTAG TCCAGTGTGG TGGAATTCGG 540 

CTTGGGATGA CGCCTCCTCC GCCCGGACGT GCCGCCCCCA GCGCACCGCG CGCCCGCGTC 600 
CCTGGCCCGC CGGCTCGGTT GGGGCTTCCG CTGCGGCTGC GGCTGCTGCT GCTGCTCTGG ' 660 

GCGGCCGCCG CCTCCGCCCA GGGCCACCTA AGGAGCGGAC CCCGCATCTT CGCCGTCTGG 720 

AAAGGCCATG TAGGGCAGGA CCGGGTGGAC TTTGGCCAGA CTGAGCCGCA CACGGTGCTT 780 
TTCCACGAGC CAGGCAGCTC CTCTGTGTGG ; GTGGG AGGACl GTGGCAAGGT CTACCTCTTT. ; \B40 

GACTTCCCCG : AGGGCAAGAA CGC ATCTGTG . CGCACGGTGA ATATCGGCTC CACAAAGGGG 900 

TCCTGTCTGG ATAAGCGGGA CTGCGAGAAC >TACATCACTC TCCTGGAGAG GCGGAGTGAG 960 

GGGCTGCTGG CCTGTGGCAC CAACGCCCGG CACCCCAGCT GCTGGAACCT GGTGAATGGC 1020 

ACTGTGGTGC CACTTGGCGA GATGAGAGGC TACGCCCCCT TCAGCCCGGA CGAGAACTCC 1080 

CTGGTTCTGT TTGAAGGGGA CGAGGTGTAT TCCACCATCC GGAAGCAGGA ATACAATGGG 1140 

AAGATCCCTC GGTTCCGCCG CATCCGGGGC GAGAGTGAGC TGTACACCAG TGATACTGTC 1200 

ATGCAGAACC CACAGTTCAT CAAAGCCACC ATCGTGCACC AAGACCAGGC TTACGATGAC 1260 



AAGATCTACT ACTTCTTCCG AGAGGACAAT CCTGACAAGA ATCCTGAGGC TCCTCTCAAT 1320 

GTGTCCCGTG TGGCCCAGTT GTGCAGGGGG GACCAGGGTG GGGAAAGTTC ACTGTCAGTC 1380 

TCCAAGTGGA ACACTTTTCT GAAAGCCATG CTGGTATGCA GTGATGCTGC .CACCAACAAG 1440 

AACTTCAACA GGCTGCAAGA CGTCTTCCTG CTCCCTGACC CCAGCGGCCA GTGGAGGGAC 1500 

ACCAGGGTCT ATGGTGTTTT CTCCAACCCC TGGAACTACT CAGCCGTCTG TGTGTATTCC 1560 

CTCGGTGACA TTGACAAGGT CTTCCGTACC TCCTCACTCA AGGGCTACCA CTCAAGCCTT 1620 

CCCAACCCGC GGCCTGGCAA GTGCCTCCCA GACCAGCAGC CGATACCCAC AGAGACCTTC 1680 

CAGGTGGCTG ACCGTCACCC AGAGGTGGCG CAGAGGGTGG AGCCCATGGG GCCTCTGAAG 1740 

ACGCCATTGT TCCACTCTAA ATACCACTAC CAGAAAGTGG CCGTTCACCG CATGCAAGCC 1800 

AGCCACGGGG AGACCTTTCA TGTGCTTTAC CTAACTACAG ACAGGGGCAC TATCCACAAG 1860 

GTGGTGGAAC CGGGGGAGCA GGAGCACAGC TTCGCCTTCA ACATCATGGA GATCCAGCCC 1920 

TTCCGCCGCG CGGCTGCCAT CCAGACCATG TCGCTGGATG CTGAGCGGAG GAAGCTGTAT 1980 

GTGAGCTCCC AGTGGGAGGT GAGCCAGGTG CCCCTGGACC TGTGTGAGGT CTATGGCGGG , 2040 

GGCTGCCACG GTTGCCTCAT GTCCCGAGAC CCCTACTGCG GCTGGGACCA GGGCCGCTGC 2100 

ATCTCCATCT ACAGCTCCGA ACGGTCAGTG CTGCAATCCA TTAATCCAGC CGAGCCACAC 2160 

AAGGAGTGTC CCAACCCCAA ACCAGACAAG GCCCCACTGC AGAAGGTTTC CCTGGCCCCA 2220 

AACTCTCGCT ACTACCTGAG CTGCCCCATG GAATCCCGCC ACGCCACCTA CTCATGGCGC 2280 

CACAAGGAGA ACGTGGAGCA GAGCTGCGAA CCTGGTCACC AGAGCCCCAA CTGCATCCTG 2340 

TTCATCGAGA ACCTCACGGC GCAGCAGTAC GGCCACTACT TCTGCGAGGC CCAGGAGGGC 2400 

TCCTACTTCC GCGAGGCTCA GCACTGGCAG CTGCTGCCCG AGGACGGCAT CATGGCCGAG 2460 

CACCTGCTGG GTCATGCCTG TGCCCTGGCT GCCTCCCTCT GGCTGGGGGT GCTGCCCACA 2520 

CTCACTCTTG GCTTGCTGGT CCACGTGAAG CTTGGGCCCG AACAAAAACT CATCTCAGAA 2580 

GAGGATCTGA ATAGCGCCGT CGACCATCAT CATCATCATC ATTGAGTTTA TCCAGCACAG 2640 

TGGCGGCCGC TCGAGTCTAG AGGGCCCGTT TAAACCCGCT GATCAGCCTC GACTGTGCCT 2700 

TCTAGTTGCC AGCCATCTGT TGTTTGCCCC TCCCCCGTGC CTTCCTTGAC CCTGGAAGGT 2 760 

GCCACTCCCA CTGTCCTTTC CTAATAAAAT GAGGAAATTG CATCGCATTG TCTGAGTAGG 2820 

TGTCATTCTA TTCTGGGGGG TGGGGTGGGG CAGGACAGCA AGGGGGAGGA TTGGGAAGAC 2880 

AATAGCAGGC ATGCTGGGGA TGCGGTGGGC TCTATGGCTT CTGAGGCGGA AAGAACCAGC 2940 

TGGGGCTCTA GGGGGTATCC CCACGCGCCC TGTAGCGGCG CATTAAGCGC GGCGGGTGTG 3000 



GTGGTTACGC GCAGCGTGAC CGCTACACTT GCCAGCGCCC TAGCGCCCGC TCCTTTCGCT 3 060 

TTCTTCCCTT CCTTTCTCGC CACGTTCGCC GGCTTTCCCC GTCAAGCTCT AAATCGGGGC 3120 

ATCCCTTTAG GGTTCCGATT TAGTGCTTTA CGGCACCTCG ACCCCAAAAA ACTTGATTAG 3180 

GGTGATGGTT CACGTAGTGG GCCATCGCCC TGATAGACGG TTTTTCGCCC TTTGACGTTG 3240 

GAGTCCACGT TCTTTAATAG TGGACTCTTG TTCCAAACTG GAACAACACT CAACCCTATC 3300 

TCGGTCTATT CTTTTGATTT ATAAGGGATT TTGGGGATTT CGGCCTATTG GTTAAAAAAT 3360 

GAGCTGATTT AACAAAAATT TAACGCGAAT TAATTCTGTG GAATGTGTGT CAGTTAGGGT 3420 

GTGGAAAGTC CCCAGGCTCC CCAGGCAGGC AGAAGTATGC AAAGCATGCA TCTCAATTAG 3480 

TCAGCAACCA GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT GCAAAGCATG 3540 

CATCTCAATT AGTCAGCAAC CATAGTCCCG CCCCTAACTC CGCCCATCCC GCCCCTAACT 3600 

CCGCCCAGTT CCGCCCATTC TCCGCCCCAT GGCTGACTAA TTTTTTTTAT TTATGCAGAG 3660 

GCCGAGGCCG CCTCTGCCTC TGAGCTATTC CAGAAGTAGT GAGGAGGCTT TTTTGGAGGC 3720 

CTAGGCTTTT GCAAAAAGCT CCCGGGAGCT TGTATATCCA TTTTCGGATC TGATCAAGAG 3 780 

ACAGGATGAG GATCGTTTCG CATGATTGAA CAAGATGGAT TGCACGCAGG TTCTCCGGCC 384 0 

GCTTGGGTGG AGAGGCTATT CGGCTATGAC TGGGCACAAC AGACAATCGG CTGCTCTGAT 3900 

GCCGCCGTGT TCCGGCTGTC AGCGCAGGGG CGCCCGGTTC TTTTTGTCAA GACCGACCTG 3 960 

TCCGGTGCCC TGAATGAACT GCAGGACGAG GCAGCGCGGC TATCGTGGCT GGCCACGACG 4020 

GGCGTTCCTT GCGCAGCTGT GCTCGACGTT GTCACTGAAG CGGGAAGGGA CTGGCTGCTA 4080 

TTGGGCGAAG TGCCGGGGCA GGATCTCCTG TCATCTCACC TTGCTCCTGC CGAGAAAGTA 4140 

TCCATCATGG CTGATGCAAT GCGGCGGCTG CATACGCTTG ATCCGGCTAC CTGCCCATTC 4200 

GACCACCAAG CGAAACATCG CATCGAGCGA GCACGTACTC GGATGGAAGC CGGTCTTGTC 4260 

GATCAGGATG ATCTGGACGA AGAGCATCAG GGGCTCGCGC CAGCCGAACT GTTCGCCAGG 4320 

CTCAAGGCGC GCATGCCCGA CGGCGAGGAT CTCGTCGTGA CCCATGGCGA TGGCTGCTTG 4380 

CCGAATATCA TGGTGGAAAA TGGCCGCTTT TCTGGATTCA TCGACTGTGG CCGGCTGGGT 4440 

GTGGCGGACC GCTATCAGGA CATAGCGTTG GCTACCCGTG ATATTGCTGA AGAGCTTGGC 4 500 

GGCGAATGGG CTGACCGCTT CCTCGTGCTT TACGGTATCG CCGCTCCCGA TTCGCAGCGC 4560 

ATCGCCTTCT ATCGCCTTCT TGACGAGTTC TTCTGAGCGG GACTCTGGGG TTCGAAATGA 4620 

CCGACCAAGC GACGCCCAAC CTGCCATCAC GAGATTTCGA TTCCACCGCC GCCTTCTATG 4680 




AAAGGTTGGG CTTCGGAATC GTTTTCCGGG ACGCCGGCTG GATGATCCTC CAGCGCGGGG 4740 

ATCTCATGCT GGAGTTCTTC GCCCACCCCA ACTTGTTTAT TGCAGCTTAT AATGGTTACA 4 800 

AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT TTTTTCACTG CATTCTAGTT 4860 

GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG TATACCGTCG ACCTCTAGCT 4920 

AGAGCTTGGC GTAATCATGG TCATAGCTGT TTCCTGTGTG AAATTGTTAT CCGCTCACAA 4980 

TTCCACACAA CATACGAGCC GGAAGCATAA AGTGTAAAGC CTGGGGTGCC TAATGAGTGA 5040 

GCTAACTCAC ATTAATTGCG TTGCGCTCAC TGCCCGCTTT GCAGTCGGGA AACCTGTCGT 5100 

GCCAGCTGCA TTAATGAATC GGCCAACGCG CGGGGAGAGG CGGTTTGCGT ATTGGGCGGT 5160 

CTTCCGCTTC CTCGCTCACT GACTCGCTGC GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT 5220 

CAGCTCACTC AAAGGCGGTA ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA 5280 

ACATGTGAGC AAAAGGCCAG GAAAAGGCCA GGAACCGTAA AAAGGCCGCG TTGCTGGCGT 5340 

TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATCACAAAAA TCGACGCTCA AGTCAGAGGT 5400 

GGCGAAACCC GACAGGACTA TAAAGATACC AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC 5460 

GCTCTCCTGT TCCGACCCTG CCGCTTACCG GATACCTGTC CGCCTTTCTC CCTTCGGGAA 5520 

GCGTGGCGCT TTCTCAATGC TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT 5580 

CCAAGCTGGG CTGTGTGCAC GAACCCCCCG TTCAGCCCGA CCGCTGCGCC TTATCCGGTA 5640 

ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC GCCACTGGCA GCAGCCACTG 5700 

GTAACAGGAT TAGCAGAGCG AGGTATGTAG GCGGTGGTAC AGAGTTCTTG AAGTGGTGGC 5760 

CTAACTACGG CTACACTAGA AGGACAGTAT TTGGTATCTG CGCTCTGCTG AAGCCAGTTA 5820 

CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT CCGGCAAACA AACCACCGCT GGTAGCGGTG 5880 

GTTTTTTTGT TTGCAAGCAG GAGATTACGC GCAGAAAAAA AGGATCTCAA GAAGATCCTT 5940 

TGATCTTTTC TACGGGGTGT GAGGGTCAGT GGAAGGAAAA CTCACGTTAA GGGATTTTGG 6 0 00 

TCATGAGATT ATCAAAAAGG ATCTTCAGCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA 6060 

AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGAC AG TTAC CAATGC TTAATCAGTG 6120 

AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 6180 

TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCGC CAGTGCTGCA ATGATAGCGC 6240 

GAGACCCACG CTCACCGGCT CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGGCG 6300 

AGCGGAGAAG TGGTCCTGCA ACTTTATCCG CCfCCATCCA GTCTATTAAT TGTTGCCGGG 6360 

AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 6420 



GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT 6480 

CAAGGCGAGT TACATGATCC CCCATGTTGT GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC 6540 

CGATCGTTGT CAGAAGTAAG TTGGCCGCAG TGTTATCACT CATGGTTATG bcAGCACTGC 6600 

ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT GAGTACTCAA 6660 

CCAAGTCATT CTGAGAATAG TGTATGCGGC GACCGAGTTG CTCTTGCCCG GCG^CAATAC 6720 

GGGATAATAC CGCGCCACAT AGCAGAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT 6780 

CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC CAGTTCGATG TAACCCACTC 6840 

GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG TGAGCAAAAA 6900 

CAGGAAGGCA AAATGCCGCA AAAAAGGGAA TAAGGGCGAC ACGGAAATGT TGAATACTCA €960 

TACTCTTCCT TTTTCAATAT TATTGAAGCA TTTATCAGGG TTATTGTCTC ATGAGCGGAT 7020 

ACATATTTGA ATGTATTTAG AAAAATAAAC AAATAGGGGT TCCGCGCACA TTTCCCCGAA 7080 

AAGTGGCACC TGACGTCGAC GGATCGGG 7108 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4019 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS := single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

CTCGAGAAAT CATAAAAAAT TTATTTGCTT , TGTGAGCGGA TAAC AATTAT AATAGATTGA r 6 0 

ATTGTGAGCG GATAACAATT; TCACACAGAA v TTCATTAAAG AGGAGAAATT AACTATGAGA 120 

GGATCGCATC ACCATCACCA TCACGGATCC . CTGGTTCTGT TTGAAGGGGA CGAGGTGTAT 180 

TCCACCATCC .GGAAGCAGGA ATACAATGGG AAGATCCCTC GGTTCCGCCG CATCCGGGGC 240 : 

GAGAGTGAGC TGTACACCAG TGATACTGTG ATGCAGAACC CACAGTTCAT CAAAGCCACC 300 

ATCGTGCACC AAGACCAGGC TTACGATGAC AAGATCTACT ACTTCTTCCG AGAGGACAAT 360 

CCTGACAAGA ATCCTGAGGC TCCTCTCAAT GTGTCCCGTG TGGCCCAGTT GTGCAGGGGG 420 

GACCAGGGTG GGG AAAGTTC ACTGTCAGTC TCCAAGTGGA ACACTTTTCT GAAAGCC ATG 480, 

CTGGTATGCA GTGATGCTGC CACCAACAAG AACTTCAACA GGCTGCAAGA CGTCTTCCTG 540 



CTCCCTGACC CCAGCGGCCA GTGGAGGGAC ACCAGGGTCT ATGGTGTTTT CTCCAACCCC 600 

TGGAACTACT CAGCCGTCTG TGTGTATTCC CTCGGTGACA TTGACAAGGT CTTCCGTACC 660 

TCCTCACTCA AGGGCTACCA CTCAAGCCTT CCCAACCCGC GGCCTGGCAA GTGCCTCCCA 720 

GACCAGCAGC CGATACCCAC AGAAAGCTTA ATTAGCTGAG CTTGGACTCC TGTTGATAGA 780 

TCCAGTAATG ACCTCAGAAC TCCATCTGGA TTTGTTCAGA ACGCTCGGTT GCCGCCGGGC 840 

GTTTTTTATT GGTGAGAATC CAAGCTAGCT TGGCGAGATT TTCAGGAGCT AAGGAAGCTA 900 

AAATGGAGAA AAAAATCACT GGATATACCA CCGTTGATAT ATCCCAATGG CATCGTAAAG 960 

AACATTTTGA GGCATTTCAG TCAGTTGCTC AATGTACCTA TAACCAGACC GTTCAGCTGG 1020 

ATATTACGGC CTTTTTAAAG ACCGTAAAGA AAAATAAGCA CAAGTTTTAT CCGGCCTTTA 1080 

TTCACATTCT TGCCCGGCTG ATGAATGCTG ATCCGGAATT TCGTATGGCA ATGAAAGACG 1140 

GTGAGCTGGT GATATGGGAT AGTGTTCACC CTTGTTACAC CGTTTTCCAT GAGCAAAGTG 1200 

AAACGTTTTC ATCGCTCTGG AGTGAATACC ACGACGATTT CCGGCAGTTT CTACACATAT 1260 

ATTCGCAAGA TGTGGCGTGT TACGGTGAAA ACCTGGCCTA TTTCCCTAAA GGGTTTATTG 1320 

AGAATATGTT TTTCGTCTCA GCCAATCCCT GGGTGAGTTT CACCAGTTTT GATTTAAACG 1380 

TGGCCAATAT GGACAACTTC TTCGCCCCCG TTTTCACCAT GGGCAAATAT TATACGCAAG 1440 

GCGACAAGGT GCTGATGCCG CTGGCGATTC AGGTTCATCA TGCCGTCTGT GATGGCTTCC 1500 

ATGTCGGCAG AATGCTTAAT GAATTACAAC AGTACTGCGA TGAGTGGCAG GGCGGGGCGT 1560 

AATTTTTTTA AGGCAGTTAT TGGTGCCCTT AAACGCCTGG GGTAATGACT CTCTAGCTTG 1620 

AGGCATCAAA TAAAACGAAA GGCTCAGTCG AAAGAGTGGG CCTTTCGTTT TATCTGTTGT 1680 

TTGTCGGTGA ACGCTCTCCT GAGTAGGACA AATCCGCCGC TCTAGAGCTG CCTCGCGCGT 1740 

TTCGGTGATG ACGGTGAAAA CCTCTGACAC ATGCAGCTCC CGGAGACGGT CACAGCTTGT 1800 

CTGTAAGCGG ATGCCGGGAG CAGACAAGCC CGTCAGGGCG CGTCAGCGGG TGTTGGCGGG 1860 

TGTCGGGGCG CAGCCATGAC CCAGTCACGT AGCGATAGCG GAGTGTATAC TGG CTTAACT 1920 

ATGCGGCATC AGAGCAGATT GTACTGAGAG TGCACCATAT GCGGTGTGAA ATACCGCACA 1980 

GATGCGTAAG GAGAAAATAC CGCATCAGGC GCTCTTCCGC TTCCTCGCTC ACTGACTCGC 2040 

TGCGCTCGGT GTGTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAAAGGCG GTAATACGGT 2100 

TATCCACAGA ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC CAGCAAAAGG 2160 

CCAGGAACCG TAAAAAGGCC GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC CCCCCTGACG 2220 



( 



AGCATCACAA AAATCGACGC TCAAGTCAGA GGTGGCGAAA CCCGACAGGA CTATAAAGAT 2280 

ACCAGGCGTT TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC CTGCCGCTTA 2340 

CCGGATACCT GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAA TGCTCACGCT 2400 

GTAGGTATCT CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG CACGAACCCC 2460 

CCGTTCAGCC CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC AACCCGGTAA 2520 

GACACGACTT ATCGCCACTG GCAGCAGCCA CTGGTAACAG GATTAGCAGA GCGAGGTATG 2580 

TAGGCGGTGC TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT AGAAGGACAG 2640 

TATTTGGTAT CTGCGCTCTG CTGAAGCCAG TTACCTTCGG AAAAAGAGTT GGTAGCTCTT 27(A) 

GATCCGGCAA ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG CAGCAGATTA 2760 

CGCGCAGAAA AAAAGGATCT CAAGAAGATC CTTTGATCTT TTCTACGGGG TCTGACGCTC 2820 

AGTGGAACGA AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA AGG ATCTTCA 2880 

CCTAGATCCT TTTAAATTAA AAATGAAGTT TTAAATCAAT CTAAAGTATA TATGAGTAAA 2940 

CTTGGTCTGA CAGTTACCAA TGCTTAATCA GTGAGGCACC TATCTCAGCG ATCTGTCTAT 3000 

TTCGTTGATC CATAGCTGCC TGACTCCCCG TCGTGTAGAT AACTACGATA CGGGAGGGCT 3060 

TACCATCTGG CCCCAGTGCT GCAATGATAC CGCGAGACCC ACGCTCACCG GCTCCAGATT 3120 

TATCAGCAAT AAACCAGCCA GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT GCAACTTTAT 3180 

CCGCCTCCAT CCAGTCTATT AATTGTTGCC GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA 3240 

ATAGTTTGCG CAACGTTGTT GCCATTGCTA CAGGCATCGT GGTGTCACGC TCGTCGTTTG 3300 

GTATGGCTTC ATTCAGCTCC GGTTCCCAAC GATCAAGGCG AGTTACATGA TCCCCCATGT 3360 

TGTGCAAAAA AGCGGTTAGC TCCTTCGGTC CTCCGATCGT TGTCAGAAGT AAGTTGGCCG 3420 

CAGTGTTATC ACTCATGGTT ATGGCAGCAC TGCATAATTC TCTTACTGTC ATGCCATCCG 3480 

TAAGATGCTT TTCTGTGACT GGTGAGTACT CAACCAAGTC ATTCTGAGAA TAGTGTATGC 3540 

GGCGACCGAG TTGCTCTTGC CCGGCGTCAA TACGGGATAA TACCGCGCCA CATAGCAGAA 3600 

CTTTAAAAGT GCTCATCATT GGAAAACGTT CTTCGGGGCG AAAACTCTCA AGGATCTTAC '3660 

CGCTGTTGAG ATCCAGTTCG ATGTAACCCA CTCQTGCACC CAACTGATCT TCAGCATCTT 3720 

TTACTTTCAC CAGCGTTTCT GGGTGAGCAA AAACAGGAAG GCAAAATGCC GCAAAAAAGG 3 _1 8 ? 

GAATAAGGGC GACACGGAAA TGTTGAATAC TCATACTCTT CCTTTTTCAA TATTATTGAA 3840 

GCATTTATCA GGGTTATTGT CTCATGAGCG GATACATATT TGAATGTATT TAGAAAAATA 3900 

AACAAATAGG GGTTCCGCGC ACATTTCCCC GAAAAGTGCC ACCTGACGTC TAAGAAACCA 3960 



TTATTATCAT GACATTAACC TATAAAAATA GGCGTATCAC GAGGCCCTTT CGTCTTCAC 4019 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

CTCGAGAAAT CATAAAAAAT TTATTTGCTT TGTGAGCGGA TAACAATTAT AATAGATTCA 60 

ATTGTGAGCG GATAACAATT TCACACAGAA TTCATTAAAG AGGAGAAATT AACTATGAGA 120 

GGATCGCATC ACCATCACCA TCACACGGAT CCGCATGCGA GCTCCCAGTG GGAGGTGAGC 180 

CAGGTGCCCC TGGACCTGTG TGAGGTCTAT GGCGGGGGCT GCCACGGTTG CCTCATGTCC 240 

CGAGACCCCT ACTGCGGCTG GGACCAGGGC CGCTGCATCT CCATCTACAG CTCCGAACGG 300 
TCAGTGCTGC AATCCATTAA TCCAGCCGAG CCACACAAGG AGTGTCCCAA CCCCAAACCA v 360 

GACAAGGCCC CACTGCAGAA GGTTTCCCTG GCCCCAAACT CTCGCTACTA CCTGAGCTGC 420 

CCCATGGAAT CCCGCCACGC CACCTACTCA TGGCGCCACA AGGAGAACGT GGAGCAGAGC 480 

TGCGAACCTG GTCACCAGAG CCCCAACTGC ATCCTGTTCA TCGAGAACCT CACGGCGCAG 54 0 

CAGTACGGCC ACTACTTCTG CGAGGCCCAG GAGGGCTCCT ACTTCCGCGA GGCTCAGCAC 600 

TGGCAGCTGC TGCCCGAGGA CGGCATCATG GCCGAGCACC TGCTGGGTCA TGCCTGTGCC 660 

CTGGCTGCCT CCCTCTGGCT GGGGGTGCTG CCCACACTCA CTCTTGGCTT GCTGGTCCAC 720 

GTGAAGCTTA ATTAGCTGAG CTTGGACTCC TGTTGATAGA TCCAGTAATG ACCTCAGAAC 780 

TCCATCTGGA TTTGTTCAGA ACGCTCGGTT GCCGCCGGGC GTTTTTTATT GGTGAGAATC 840 

CAAGCTAGCT TGGCGAGATT TTCAGGAGCT AAGGAAGCTA AAATGGAGAA AAAAATCACT 900 

GGATATACCA CCGTTGATAT ATCCCAATGG CATCGTAAAG AACATTTTGA GGCATTTCAG 960 

TCAGTTGCTC AATGTACCTA TAACCAGACC GTTCAGCTGG ATATTACGGC CTTTTTAAAG 1020 

ACCGTAAAGA AAAATAAGCA CAAGTTTTAT CCGGCCTTTA TTCACATTCT TGCCCGCCTG 1080 

ATGAATGCTC ATCCGGAATT TCGTATGGCA ATGAAAGACG GTGAGCTGGT GATATGGGAT 1140 

AGTGTTCACC CTTGTTACAC CGTTTTCCAT GAGCAAACTG AAACGTTTTC ATCGCTCTGG 1200 



AGTGAATACC ACGACGATTT CCGGCAGTTT CTACACATAT ATTCGCAAGA TGTGGCGTGT 1260 

TACGGTGAAA ACCTGGCCTA TTTCCCTAAA GGGTTTATTG AGAATATGTT TTTCGTCTCA 1320 

GCCAATCCCT GGGTGAGTTT CACCAGTTTT GATTTAAACG TGGCCAATAT GGACAACTTC 1380 

TTCGCCCCCG TTTTCACCAT GGGCAAATAT TATACGCAAG GCGACAAGGT GCTG ATGCCG 1440 

CTGGCGATTC AGGTTCATCA TGCCGTCTGT GATGGCTTCC ATGTCGGCAG AATGCTTAAT 1500 

GAATTACAAC AGTACTGCGA TGAGTGGCAG GGCGGGGCGT AATTTTTTTA AGGCAGTTAT 1560 

TGGTGCCCTT AAACGCCTGG GGTAATGACT CTCTAGCTTG AGGCATCAAA TAAAACGAAA 1620 

GGCTCAGTCG AAAGACTGGG CCTTTCGTTT TATCTGTTGT TTGTCGGTGA ACGCTCTCCT 1680 

GAGTAGGACA AATCCGCCGC TCTAGAGCTG CCTCGCGCGT TTCGGTGATG ACGGTGAAAA 1740 

CCTCTGACAC ATGCAGCTCC CGGAGACGGT CACAGCTTGT CTGTAAGCGG ATGCCGGGAG 1800 

CAGACAAGCC CGTCAGGGCG CGTCAGCGGG TGTTGGCGGG TGTCGGGGCG CAGCCATGAC 186 0 

CCAGTCACGT AGCGATAGCG GAGTGTATAC TGGCTTAACT ATGCGGCATC AGAGCAGATT 1920 

GTACTGAGAG TGCACCATAT GCGGTGTGAA ATACCGCACA GATGCGTAAG GAGAAAATAC 198 0 

CGCATCAGGC GCTCTTCCGC TTCCTCGCTC ACTGACTCGC TGCGCTCGGT CTGTCGGCTG 204 0 

CGGCGAGCGG TATCAGCTCA CTCAAAGGCG GTAATACGGT TATCCACAGA ATCAGGGGAT 2100 

AACGCAGGAA AGAACATGTG AGCAAAAGGC CAGCAAAAGG CCAGGAACCG TAAAAAGGCC 2160 

GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC CCCCCTGAGG AGCATCACAA AAATCGACGC 222 0 

TCAAGTCAGA GGTGGCGAAA CCCGACAGGA CTATAAAGAT ACCAGGCGTT TCCCCCTGGA 2280 

AGCTCCCTCG TGCGCTCTCC TGTTCCGACC CTGCCGCTTA CCGGAT AC CT GTCCGCCTTT 234 0 

CTCCCTTCGG GAAGCGTGGC GCTTTCTCAA TGCTCACGCT GTAGGTATCT CAGTTCGGTG 2400 

TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG CACGAACCCC CCGTTCAGCC CGACCGCTGC 246 0 

GCCTTATCCG GTAACTATCG TCTTGAGTCC AACCCGGTAA GAGACGACTT ATCGCCACTG . 2520 

GCAGCAGCCA CTGGTAACAG GATTAGCAGA GCGAGGTATG TAGGCGGTGC TACAGAGTTC 258 0 

TTGAAGTGGT GGCCTAACTA CGGCTACACT AGAAGGACAG TATTTGGTAT CTGCGCTCTG ' 2640 

CTGAAGCCAG TTACCTTCGG AAAAAGAGTT GGTAGCTCTT GATCCGGCAA ACAAACCACC 2700 

GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG CAGCAGATTA CGCGCAGAAA AAAAGGATCT 2760 

CAAGAAGATC CTTTGATCTT TTCTACGGGG TCTGACGCTC AGTGGAACGA AAACTCACGT 2820 

TAAGGGATTT TGGTCATGAG ATTATCAAAA AGGATCTTCA CCTAGATCCT TTTAAATTAA 2880 



- , ) 



AAATGAAGTT TTAAATCAAT CTAAAGTATA TATGAGTAAA CTTGGTCTGA CAGTTACCAA 2940 ' 

TGCTTAATCA GTGAGGCACC TATCTCAGCG ATCTGTCTAT TTCGTTCATC CATAGCTGCC 3000 

TGACTCCCCG TCGTGTAGAT AACTACGATA GGGGAGGGCT TACCATCTGG CCCCAGTGCT 3060 

GCAATGATAC GGCGAGAGCC ACGGTCACCG GGTCCAGATT TATCAGCAAT AAACOAGCCA 3120 

GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT GGAACTTTAT GCGCCTCCAT CCAGTCTATT 3180 

AATTGTTGCC GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA ATAGTTTGCG CAACGTTGTT 3240 

GCCATTGCTA CAGGCATCGT GGTGTCACGC TCGTCGTTTG GTATGGCTTC ATTCAGCTCC 3300 

GGTTCCCAAC GATCAAGGCG AGTTACATGA TCCCCGATGT TGTGCAAAAA AGCGGTTAGC 3360 

TCCTTCGGTC CTCCGATCGT TGTGAGAAGT AAGTTGGCCG GAGTGTTATC ACTGATGGTT 3420 

ATGGCAGCAC TGCATAATTC TCTTACTGTC ATGCCATCCG TAAGATGCTT TTCTGTGACT 3480 

GGTGAGTACT GAACCAAGTC ATTCTGAGAA TAGTGTATGC : GGCGACCGAG TTGCTCTTGC 3540 

CCGGCGTCAA. TACGGGATAA TACCdCGCCA CATAGCAGAA CTTTAAAAGT GCTCATCATT 3600 

GGAAAACGTT CTTCGGGGCG: AAAACTCTCA AGGATCTTAC CGCTGTTGAG ATGCAGTTCG 3660 

ATGTAACCCA CTCGTGCACC CAACTGATCT TCAGCATCTT TTACTTTCAC CAGCGTTTCT 3720 

GGGTGAGCAA AAACAGGAAG GCAAAATGCC GCAAAAAAGG GAATAAGGGC GACACGGAAA- 3780 

TGTTGAATAC TCATACTCTT CCTTTTTGAA TATTATTGAA GCATTTATCA GGGTTATTGT 3840 

CTCATGAGCG GATACATATT TGAATGTATT TAGAAAAATA AACAAATAGG GGTTCCGCGC 3900 

ACATTTCCCC GAAAAGTGGC ACGTGACGTC TAAGAAACCA TTATTATGAT GACATTAACC 3960 

TATAAAAATA GGCGTATCAC GAGGCCCTTT CGTCTTCAC 3999 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8888 base pairs 

(B) TYPE: nucleic acid • ; ? /. - ; - '> 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear . . 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

GAGCCGCACA CGGTGCTTTT CCACGAGCCA GGCAGCTCCT CTGTGTGGGT GGGAGGACGT 
GGCAAGGTCT ACCTCTTTGA CTTCCCCGAG GGCAAGAACG CATCTGTGCG CACGGTGAGC 



60 



120 



CTCTCTCTTC CCCCAACACC CGCCCTACCC TCTTATCTCC CCTCTGGCCC TGCCAAGGGT 180 

CCTCAGGGAA TCCGAGGGAG CTGGCTTCTC TTCCTAAACT GCCCCGACCT CCGTATCCTA 240 

TAAATGGCTC CTGGGGGAGG CTCCCTAAAG GTAGTCCAGA TTGGAGTGGG GAGCTGGGGC 300 

GGTGTGGAGA AAAACAGGAG CTAATGGGCC TGGCCAGCTG GGCAGCGCTG CTGCGGAAAG 360 

CCCAGGCTGG AAGCTGGGCC CCAGAGCCCA TGCCTGGTCT TCTGAACCCT GTGGGGCTCA 420 ■> 

GCTCTGGATA TGAGACCCTG TTTGACCTCA -GGTAGATCAC TCACCCTCTC AGAGCGCCAG 480 

TTGCTCATCT GTCAGATGAG AATAATGGTT GCTTCCTTTG GGGCTOIATCC TGAGGGTGTG 540 

TGGAAAG CAT TTCAGGGGTA CCTCACCGCT GGCAGATTGA ACTAATGCTT CTCCCGTTCC 600 

CCAGGTGAAT ATCGGCTCCA " CAAAGGGGTG CTGTGTGGAT AAGGGGGTGA GCGGGGGAGG 660 

GATCTGGAGG GGTCTGAGCC ACTTGGTAAA '■■ GGGAGAGGAG ACCGTGAGGG TCTAAGGAAG 720 

GAAGCATGGC CCTGCCCCAC GAGTCGCAGA CTGATGGGGA : GACGTGGTCC TCTGTGCTTA 780 

GGGGATGGCG TCAGCTGCAC AGACTGTGGG CTGTCCCGGG AGGCTGTCAC CTATGCTAAG - 840 
CCCTTCTGAC ACCTTCTTCC CTGATCCTGG GGGTGCTAGT GGTAGGCTTG CCAGGGCCTT 900 
CCAGCAACCA ATTTCTCTCG TCCCTTCTCT CTTCCCCGGG CAGGACTGCG AGAACTACAT 960 

CACTCTCCTG GAGAGGCGGA GTGAGGGGCT GCTGGGCTGT GGCACCAACG CCCGGCACCC 1020 

CAGCTGCTGG AACCTGGTGA GAAGGCTGCT CCGCATGTGC CTGATCAGCT CACCTTCTAC V 1080 

TGCGTGGGCT TCTGCCCCTC ATGGTGGGAA GGAGATGGCG AGACTCCAAT GCTGGCCTTG 1140 

CCCTGGGAGG ATGGGGCTCC TGGCCGAGAA -ACTGGCCGTC ATGGGAGGCA GTGGCTGTGG 1200 

GATTATGTGG CGATCCAACC CTCTGGATCT CCCACAGGTG AATGGCACTG TGGTGCCACT 1260 

TGGCGAGATG AGAGGCTACG CCCCCTTCAG CCCGGACGAG AACTCCCTGG TTCTGTTTGA 1320 

AGGTTGGGGC ATGCTTCGGA ACTGGGCTGG GAGCAGGATG GTCAGCTCTT TGTCCAGTGT 1380 

CCGGAGGAGG GACTTCCAGG AGCTGCCTGC CCTTACTCAT TTCTCCCTCC CACTGACCCC 1440 

AGGGGACGAG GTGTATTCCA CCATCCGGAA GCAGGAATAC AATGGGAAGA TCCCTCGGTT 1500 

CCGCCGCATC CGGGGCGAGA GTGAGCTOTA CACCAGTGAT ACTGTCATGG AGAGTGAGTC 1560 

AGGCTCCGGC TGGGCTGAGG GTGGGCAAGG GGGTGTGAGC ACTTAAGGTG GGAGATGGGA 1620 

TCCTGATGTT TCTGGGAGGG CTCCCTGAGG GCCGCTGGGG CCATGCAGGA AAGCAGGACC 1680 

TTGGTATAGG CCTGAGAAGT TAGGGTTGGC TGGGAGCAGA GGAACAGACA AGGTATAGCA 1740 

GTGGGATGGG CCCAGCCCTC TTCAGGAACA CAAACAGAGG GAGCCCCAGA CCCAGTGCAG 1800 

GGTCCCCAGG AGCCAAAGTT TATCCTCTGC TGAGTTCACG TGGAGGCAGC CCCCCAACTC 1860 



CCTCCTCATC AGGGCTCTGC CAATTGAGCA GAAGTGACAT AGGGGCGCCC AGGGACCTTC 1920 

CCCCACTCCC CAGGCATGAA GTCATTGCTC CTGGGCCGAT GACATCTTTG TAGGAAGAGG 1980 

-GCAAAACAGG TGTGGGGTGG AGGTGCAGGG TCTAGGGCCC CTCGGGGAGT TGGACCTGAT 2040 

GTTATGAGTC CTATTCCAGA TCTGATTTGC CATGGTTTGT GGAGACCCGA AGGAGGGAGG 2100 

AGAGTGTGCA GGGTTGGAAT GGTCTCCCGG GCAAGCTTCC CAGCCTTACG CCCATTCGCT 2160 

TCTGTGCCCT GGCAGACCGA CAGTTCATCA AAGCCACCAT CGTGCACCAA GACCAGGCTT 2220 

ACGATGACAA GATCTACTAC TTCTTCCGAG AGGACAATCC TGACAAGAAT CCTGAGGCTC 2280 

CTCTCAATGT GTCCCGTGTG GCCCAGTTGT GCAGGGTGAA CACGGGCGTG AGGGCTGCTG 2340 

GCTACGTGTC TGTGCATGAA TAGGCCTGAG TGAGGGTGAG TTCTGTGTGT CCGTGTGCAT 2400 

GTAGAAGTTG TGTGGATGTA TGAGTGGGTC TGTGTCAGGG ACTGTGGGAG CAGCTGTGTG 2460 

TGCATGGAGC ATCATGTGTC TGTGTGTGGG TAAAGGTGGC TGAGCTCCTG TGCACGTATG 2520 

ATGGCGTGTG AGCGTGTGTA TGATGGGGTG TGTGTGTGTG TGTGTGTGTG TGTTTTGCCT 2580 

GTGTGAATGT GCTGTGCCAC GTATGTGGGT GCGTGAGTCA GTAAATGTGT GTCTGAGTCC 2640 

GTCTGCTCTG TGGGGACCTG GCACTCTCAC CTGCCCTGAC CCTGGGCACT GCTGGCCCTG 2700 

GG CTCTGG AT CAGCCAGGCC TGCTTGCAGG AGTCTCATCT GGAGACCTGC CCTGAGTCCT 2760 

GGGGCACCCC CGGCAGGTCC TGGCCCCTCG CAGCCTGCCT TCCTCCTCTG GGCCCAGGTG 2820 

TTGATATTGC TGGCAGTGGT TTCCTGGGGT GTGTGGGGAA GCCCGGGCAG GTGCTGAGGG 2880 

GCCTCTTCTC CCCTCTACCC TTCCAGGGGG ACCAGGGTGG GGAAAGTTCA CTGTCAGTCT 2940 

CCAAGTGGAA CACTTTTCTG AAAGCCATGC TGGTATGCAG TGATGCTGCC ACCAACAAGA 3000 

ACTTCAACAG GCTGCAAGAC GTCTTCCTGC TCCCTGACCC CAGCGGCCAG TGGAGGGACA 3060 

CCAGGGTCTA TGGTGTTTTC TCCAACCCCT GGTGAGTGGC CCTTGTCCTG GGGCCGGGGC 3120 
TGGCATTGGT TCAGTGTCCA GTAGGGACAG GAGGCCTTGG GCCCTGCTGA GGGCCTCCCT . 3180 

GGTGTGGCAG GAGCAGGGGC TGCAGGCTCA AGAGGCTGGG CTGTTGCTGG GTGTGGGGTG 324 0 

GGGGGACAGC CAGTGCGATG TATGTACTGT TGTGTGAGTG AGTCTGCACT CATGGGTGTG 3300 

TGTGCATGCC CTATATGCAC ACTCATGACT GCACTTGTGC CTGTGTGTCC CACCACCTGC 3360 

TTGTGCCGAG AGTGGACACT GGGCCCAGGA GGAAGCTGCT GAAGCATCTC TCGGGGAGCT 3420 

GGGTGCTATT ACACCTGCTC AGGCACTGCC TGAGCCCGAT AATTCACACT TCTTAATCAC 3480 

TCTCATTGAT TGAACACACG GCAGGCGGAA GTGTTGGGTG TGTGTGGGGA GAGTTAGGGA 3540 



TAGAGTGGAG GAAGCCAAGA CCCTGCTCTG TGGCTCCTGG GTGAGTGGGT CCCCCAGGCT 3600 

GGGAAGGGGT TGGGGGTCTG GCCTCCTGGG GCATCAGCAC CCCACAGCCT GTGCCCAGGG 3660 

AGGGCTAGAG AACTGCTCAG CCTATGATGG GGTTCCTCCT GCCTTGGGGT TGGGTAGAGC 3720 

AGATGGCCTC TAGACTCAGT GATTCTGTAA CAGGATACAA GTTTGTGGTT TTAAATTGCA 3780 

GCACAAAGAA ATTAGGCTGA ACTCCTCTCC TTCCTCCTCT CCATCCCTCC CCATTTTCAG 3840 

TGGTGGTTGG CAACTCAGTG CCAGGCACAA GGCTGGCCTG GGTGAGTGGA GGTGGATGGG 3900 

TGGGTTCTGG GCCCCCCATT GAGCTGGTCT CCATGTCACT GCAGGAACTA CTCAGCCGTC 3960 

TGTGTGTATT CCCTCGGTGA CATTGACAAG GTCTTCCGTA CCTCCTCACT CAAGGGCTAC 4020 

CACTCAAGCC TTCCCAACCC GCGGCCTGGC AAGGTGAGCG TGACACCAGC CGTGGCCCAG 4080 

GCCCAGCCCT CCTTCTGCCT CACCTCCCAC CACCCCACTG ACCTGGGCCT GCTCTCCTTG 4140 

CCCAGTGCCT CCCAGACCAG CAGCCGATAC CCACAGAGAC CTTCCAGGTG GCTGACCGTC 4200 

ACCCAGAGGT GGCGCAGAGG GTGGAGCCCA TGGGGCCTCT GAAGACGCCA TTGTTCCACT 4260 

CTAAATACCA CTACCAGAAA GTGGCCGTCC ACCGCATGCA AGCCAGCCAC GGGGAGACCT 4320 

TTCATGTGCT TTACCTAACT ACAGGTGAGA GGCTACCCCG GGACCCTCAG TTTGCTTTGT 4380 

AAAAACGGGC ATGAAAGGTG TAAGGAATAA TGTAGTTAAC ATCTGGTTGG ATCTTTACAT 4440 

GTGGAAGGAA TAATTGAGTG ACTGGAGTTG TCAGGGGTTA ATGTGTGTGG GTGTGGAAG A 4500 

GCCAGGCAGG GAGAGCTTCC TGGAGGAGGT AGGGGCAAGA GGGAAAGGGG GATGGGAGAA 4 560 

AAGCAAGCAC TGGGATTTGG AGGCGGAAAT CTGGAGAGTC TGAGCAAAGC CAGGTGCACC 4620 

TTTGGTCCAG ATGTCTGACT CAGGGAAGAA GATGGTAGGA AGAGACGTGG CAAATGAGGA 4680 

GGAGGGGCCT GAACCACAGG GATACTGGCC TCTGCCAGGC AGAATGAGGG AGTCAGGCCC 4740 

TGCGCCTGTC TTTGGGATTG TGCAGGTGAG AAGAAACATT TGAGGAGTTG ATGGGGCACA 4800 

AATTAGGTAT GGGGAAGGAG TTCCAGGGGG CAGAACCTTT GCCATCTCAC AGAGGACAGG 4860 

GGCAGCTTCT CTTCTTCCCT GGAGTAGGCC CTGCTGGGGG AAGCTGGGTG GAATGCCGTG 4920 

GGAGATGCTC CTGCTTTCTG GAAAGCCACA GGACACGGAG GAGCCAGTCC TGAGTTGGGT 4980 

TTGTCGCAGC TTCCCATGCC AGCTGCCTTC CTTGAGACTG GAAAGGGCCT CTAGCACCCC 5040 

TGGGGCCATT CAATTCAGGC CCAGGCGCCC AACCTCAGTT GTTCACATTC CCCATGTGAT 5100 

CTCCTGTTGC TGCTTCACCT TGGGACTGTC TCGGCTTTGG TGACCTTGTA GGAAACTGGA 5160 

ACCCCAGCAC CATTGTTTGG CTCCTGGAAG CCTTGGGGAG AGGAATTTGC CACAGGGCAG 5220 

GGCCTGGGTC CTGATTCCCT GCCTCTTT AC TCCCTATTCA TCCCGGCTAC ACCCTTGGGC 5280 
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CCCCATCCTT GCTTGGCTCC AGTACTGGCT GGCACAGCTG TTGTGGTCAT CCAGGGATGG 5340 

CAGGGCACTG GGGAACAGAA GAGAGAGGTC ACACAGTGCG GAACTGGGAG CAGGAGCTAG 5400 

GACAAGGAAG GCTGGACTTG GGCCATGGAT TCCCTTCCTG CAGACTTGGG AAGTGAGCAC 5460 

ACTTGAGTGA TTAGAGAAGG TGTCTTCGTT CTAAGGGCAG TGGAGGAGGC ACCATTTTGG 5520 

AGCCTGCATC ATTCGTATTT GGGCTAGATT GAAAAATAGA GCTTTCTAAG TCCTCTGCAG 5580 

AGAATGGGAG GCTCTCACAA CTGGGAGAAG TATTGGCTCT TTTCCTGAGA ATTTTGCCAA 5640 

GGGTATGCTG TTACTGGGGC TGGTTTGGAA GGAGTATAGG GCATTATGTC TGTGAAGGCA 57 0Q 

GTGGCTGGGG TGGGGCCTTA TCAGGCCCAA GGAGCATCTG GCCACATCTC AGAGTCCACA 5760 

GATGAGGATC ACGGATGTGT AGAGGAAACA TCCTAGGCAG GCAATCATCT GACTGCTTTT 5820 

TTGGGGCAGG TGATGCCCTG GGAAATTGGG AGGGAGGGAG AGAGGGAGGT AGGCTATTCT 5880 

AGAAACTGGG AGAGCAGGTG AGGTAGGATT GGGAGGACCA GGGGTCAGGG TCCCCATTGG 594 0 

TCCCTAATTG AGAACGGAGA GAGCATTGGT CTAGGAGGCA GGCAGCTCGG TTATAAGACC 6000 

TTGGGAACTC TTGATTTAGA ATCCAAGATC CTTTTTAGAT CTAGGATTTT ATAAAATTAA 6060 

GATATCCCCT AAGATCAAAT G C AACGTGG A GTCCTGAATT GGATCCTAGA ACAGAAGAAG 6120 

GACATTTGTG GAAAAACTAG TGAAATCCAA ATAAAGTCTG TAGTTTTGTT AATAGTAATG 6180 

CACCAATGTC AGTTGCCTAG TTGTGACAAA TATACCGTGG TTATGTAAGA TGGTAACATT 624 0 

AGGGGGAACT GGAGAAGGGT AGATTGGAGC TCTCTGTACT ATCTTTGCAA CTTTTCTGGG 6300 

AATCTAAAAT TACTCCAAAA TAAAAAAAAA ATGTATTTAA AGTAAATATA TTCCCTAAGA 636 0 
GTCCAGGAGG CAGGGGAGTT GTAGAAGCAG CTGAGTGGTT GGGTTCTGAC AGATTTGGTT . 6420 

CCAACTCGGT CTCTGCTGCT CACCAGCTGT GTGACCTTGA GCAAGTGGCT TAGCCTTTCT 648 0 

GAGCCTGATT TCCTTATCTG TGGAGTGGGG AAGATGACAG CCACCTCGCA GGGCTGTGGA 6540 
GGGTTAAACG AGGTGATGCA TGGACAGCAG CCGCACTGAC CTTGCTGGTG TGGGGCTCCT . 6600 

GCTTCTGTTC TTCCCGTGCA GCCTTGGGAA TGTTGGAGGC CGTATCCAGG GACCCCTGGG 6660 

CCTCCTGGGA TGGCCTCTCT GGATCAGCCT TGGAAGGTTC CAGGCTGCCC TTAGGCTCCC 6720 

ACATTCTTCC CCAGTCACGC TCTCCTCGCC CTGCCCACAC CAGTCCTGTG ACCCTTGCCT 6780 

GAGTTGTGAC TTCCCACCCC TCCCCGGCCT AGAGGAAAGC- TGCCTGGCCC CTCAGTGGGA 6840 

CTCCCGCCCA CTGACCCTCT GTCCACCATA CACAGACAGG GGCACTATCC ACAAGGTGGT 6 900 

GGAACCGGGG GAGCAGGAGC ACAGCTTCGC CTTCAACATC ATGGAGATCC AGCCCTTCCG 6 960 




CCGCGCGGCT GCCATCCAGA CCATGTCGCT GGATGCTGAG CGGGTGAGCC TTCCCCCACT 7020 

GCGTCCCATG GGCTATGCAG TGACTGCAGC TGAGGACAGG GCTCCTTTGC ATGTGATTTG 7080 

TGTGTTCTTT TAAGAGCTTC TAGGCCTTAG GGCCTGGACA TTTAGGACTG AGTGTGGGGT 7140 

GGGGCCCGGG CCTGACCCAA TCCTGCTGTC CTTCCAGAGG AAGCTGTATG TGAGCTCCCA 7200 

GTGGGAGGTG AGCCAGGTGC CCCTGGAGCT GTGTGAGGTC TATGGCGGGG GCTGCCACGG 7260 

TTGCCTCATG TCCCGAGACC CCTACTGCGG CTGGGACCAG GGCCGCTGCA TCTCCATCTA 7320 

CAGCTCCGAA CGGTACGTTG GCCGGGATCC CTCCGTCCCT GGGACAAGGT GGGCATGGGA 7380 

CAGGGGGAGG TGTTGTCGGG CTGGAAGAGG TGGCGGTACT GGGCCTTTCT TGTGGGACCT 7440 

CCTCTCTACT GGAACTGCAC TAGGGGTAAG GATATGAGGG TCAGGTCTGC AGCCTTGTAT 7500 

CTGCTGATCC TCTTTCGTCC TTCCCACTCC AGGTCAGTGC TGCAATCCAT TAATCCAGCC 7560 

GAGCCACACA AGGAGTGTCC CAACCCCAAA CCAGGTACCT GATCTGGCCC TGCTGGCGGC 7620 

TGTGGCCCAA TGAGTGGGGT ACTGCCCTGC CCTGATTGTC CTGGTCTGAG GGAAACATGG 7680 

CCTTGTCCTG TGGGCCCCAG GTACATGGGG CAGGATACAG TCCTGCAGAG GGAGCCCTCT 7740 

TGGTGGGATG AGCGAGACGG GAGAAAAAAG GAGGACGCTG AGGGCTGGGT TCCCCACGTT 7800 

CATTCAGAAG CCTTGTCCTG GGATCCCAGT CGGTGGGGAG GACACATCCT CCCCTGGGAG 7860 

CTCTTTGTCC CTCCTCACGG CTGCTTCCCC ACTGCCTCCC CAGACAAGGC CCCACTGCAG 7920 

AAGGTTTCCC TGGCCCCAAA CTCTCGCTAC TACCTGAGCT GCCCCATGGA ATCCCGCCAC 7980 

GCCACCTACT CATGGCGCCA CAAGGAGAAC GTGGAGCAGA GCTGCGAACC TGGTCACCAG 8040 

AGCCCCAACT GCATCCTGTT CATCGAGAAC CTCACGGCGC AGCAGTACGG CQACTACTTC 8100 

TGCGAGGCCC AGGAGGGCTC CTACTTCCGC GAGGCTCAGC ACTGGCAGCT GCTGCCCGAG 8160 

GACGGCATCA TGGCCGAGCA CCTGCTGGGT CATGCCTGTG CCCTGGCCGC CTCCCTCTGG 8220 

CTGGGGGTGC TGCCCACACT CACTCTTGGC TTGCTGGTCC ACTAGGGCCT CCCGAGGCTG 8280 

GGCATGCCTC AGGCTTCTGC AGCCCAGGGC ACTAGAACGT CTCACACTCA GAGCCGGCTG 8340 

GCCCGGGAGC TCCTTGCCTG CCACTTCTTC CAGGGGACAG AATAACCCAG TGGAGGATGC 8400 

CAGGCCTGGA GACGTCCAGC CGCAGGCGGC TGCTGGGCCC CAGGTGGCGC ACGGATGGTG 8460 

AGGGGCTGAG AATGAGGGCA CCGACTGTGA AGCTGGGGCA TCGATGACCC AAGACTTTAT 8520 
CTTCTGGAAA ATATTTTTCA GACTCCTCAA ACTTGACTAA ATGCAGCGAT GCTCCCAGCC . 8580 

CAAGAGCCCA TGGGTCGGGG AGTGGGTTTG GATAGGAGAG CTGGGACTCC ATCTCGACCC 8640 

TGGGGCTGAG GCCTGAGTCC TTCTGGACTC TTGGTACCCA CATTGCCTCC TTCCCCTCCC 8700 



TCTCTCATGG CTGGGTGGCT GGTGTTCCTG AAGACCCAGG GGTAGCCTGT GTCCAGCCCT 
GTCCTCTGCA GCTCCCTCTC TGGTCCTGGG TCCCAGAGGA CAGCCGCCTT GCATGTTTAT 
TGAAGGATGT TTGCTTTCCG GACGG AAGGA CGGAAAAAGC TCTGAAAAAA AAAAAAAAAA 
AAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 42 : < ' ' : - : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6622 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear ' ;r - ' 
(ii) MOLECULE TYPE: DNA (genomic) - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GATATCATGG AGATAATTAA AATGATAACC ATCTCGCAAA TAAATAAGTA TTTTACTGTT 
TTCGTAACAG TTTTGTAATA AAAAAACCTA TAAATATGAA ATTCTTAGTC AACGTTGCCC 
TTGTTTTTAT GGTCGTATAC ATTTCTTACA TCTATGCGGA TCGATGGGGA TCCGCCCAGG 
GCCACCTAAG GAGCGGACCC CGCATCTTCG CCGTCTGGAA AGGCCATGTA GGGCAGGACC 
GGGTGGACTT TGGCCAGACT GAGCCGCACA CGGTGCTTTT CCACGAGCCA GGCAGCTCCT 
CTGTGTGGGT GGGAGGACGT GGCAAGGTCT ACCTCTTTGA CTTCCCCGAG GGCAAGAACG 
CATCTGTGCG CACGGTGAAT ATCGGCTCCA CAAAGGGGTC CTGTCTGGAT AAG CGGGACT 
GCGAGAACTA CATCACTCTC CTGGAGAGGC GGAGTGAGGG GCTGCTGGCC TGTGGCACCA 
ACGCCCGGCA CCCCAGCTGC TGGAACCTGG TGAATGGCAC TGTGGTGCCA CTTGGCGAGA 
TGAGAGGCTA TGCCCCCTTC AGCCCGGACG AGAACTCCCT GGTTCTGTTT GAAGGGGACG 
AGGTGTATTC CACCATCCGG AAGCAGGAAT ACAATGGGAA GATCCCTCGG TTCCGCCGCA 
TCCGGGGCGA GAGTGAGCTG TACACCAGTG ATACTGTCAT GCAGAACCCA CAGTTCATCA 
AAGCCACCAT CGTGCACCAA GACCAGGCTT ACGATGACAA GATCTACTAC TTCTTCCGAG 
AGGACAATCC TGACAAGAAT CCTGAGGCTC CTCTCAATGT GTCCCGTGTG GCCCAGTTGT 
GCAGGGGGGA CCAGGGTGGG GAAAGTTCAC TGTCAGTCTC CAAGTGGAAC ACTTTTCTGA 
AAGCCATGCT GGTATGCAGT GATGCTGCCA CCAACAAGAA CTTCAACAGG CTGCAAGACG 
TCTTCCTGCT CCCTGACCCC AGCGGCCAGT GGAGGGACAC CAGGGTCTAT GGTGTTTTCT 
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CCAACCCCTG 


GAACTACTCA 


GCCGTCTGTG 


TGTATTCCCT 


CGGTGACATT 


GACAAGGTCT 


1080 


TCCGTACCTC 


CTCACTCAAG 


GGCTACCACT 


GAAGCCTTCC 


CAACCCGCGG 


CCTGGCAAGT 


1140 


GCCTCCCAGA 


CCAGCAGCGG 


ATACCGACAG 


AGACCTTCCA 


GGTGGCTGAC 


CGTGACPCAG 


1200 


AGGTGGCGCA 


GAGGGTGGAG 


CCCATGGGGC 


CTCTGAAGAC 


GGCATTGTTC 


CACTCTAAAT 


1260 


ACCACTACCA 


GAAAGTGGCC 


GTTCACCGCA 


TGCAAGCCAG 


CCACGGGGAG 


ACCTTTCATG 


1320 


TGCTTTACCT 


AACTACAGAC 


AGGGGCACTA 


TCCACAAGGT 


GGTGGAAGCG 


GGGGAGGAGG 


1380 


AGCACAGCTT 


CGCCTTCAAC 


ATCATGGAGA 


TCCAGCCCTT 


CCGCCGCGCG 


GCTGCCATCC 


1440 


AGACCATGTC 


GCTGGATGCT 


GAGCGGAGGA 


AGCTGTATGT 


GAGCTCCCAG 


TGGGAGGTGA 


1500 


GCCAGGTGCC 


CCTGGACCTG 


TGTGAGGTCT 


ATGGCGGGGG 


CTGCCACGGT TGCCTCATGT 


1560 


CCCGAGACCC 


CTACTGCGGC 


TGGGACCAGG 


GCCGCTGCAT 


CTCCATCTAC 


AGCTCCGAAC 


. 1620 


GGTCAGTGCT 


GCAATCCATT 


AATCCAGCCG 


AGCCACACAA 


GGAGTGTCCC 


AACCCCAAAC . 


1680 


CAGACAAGGC 


CCCACTGCAG 


AAGGTTTCCC 


TGGCCCCAAA 


CTCTCGCTAC 


TACCTGAGCT 


1740 


GCCCCATGGA 


ATCCCGCCAC . 


GCCACCTACT 


CATGGCGCCA 


CAAGGAGAAC 


GTGGAGCAGA 


1800 


GCTGCGAACC 


TGGTCACCAG 


AGCCCCAACT 


GCATCCTGTT 


CATCGAGAAC 


CTCACGGCGC 


1860 


AGCAGTACGG 


CCACTACTTC 


TGCGAGGCCC 


AGGAGGGCTC 


CTACTTCCGC 


GAGGCTCAGC 


1920 


ACTGGCAGCT 


GCTGCCCGAG 


GACGGCATCA 


TGGCCGAGCA 


CCTGCTGGGT 


CATGCCTGTG 


1980 


CCCTGGCTGC 


CTGAATTCGA 


AGCTTGGAGT 


CGACTCTGCT 


GAAGAGGAGG 


AAATTCTCCT 


2040 


TGAAGTTTCC 


CTGGTGTTCA 


AAGTAAAGGA 


GTTTGCACCA 


GACGCACCTC 


TGTTCACTGG 


2100 


TCCGGCGTAT 


TAAAACACGA 


TACATTGTTA 


TTAGTACATT 


TATTAAGCGC 


TAGATTCTGT 


2160 


GCGTTGTTGA 


TTTACAGACA 


ATTGTTGTAC 


GTATTTTAAT 


AATTCATTAA 


ATTTATAATC 


2220 


TTTAGGGTGG 


TATGTTAGAG 


CGAAAATCAA 


ATGATTTTCA 


GCGTCTTTAT 


ATCTGAATTT 


2280 


AAATATTAAA 


TCCTCAATAG 


ATTTGTAAAA 


TAGGTTTCGA 


TTAGTTTCAA 


ACAAGGGTTG 


2340 


TTTTTCCGAA 


CCGATGGCTG 


GACTATCTAA 


TGGATTTTCG 


CTCAACGCCA 


CAAAACTTGC 


2400 


CAAATCTTGT 


AGCAGCAATC 


TAGCTTTGTC 


GATATTCGTT 


TGTGTTTTGT 


TTTGTAATAA 


2460 


Avab X 1 v_Vj>\V_Vj 


X \-\3 X X 


l/il X r\. X OV,VJU 


TTTTfiTRTTT 

X X X X VJ X *X XXX 


CTTTCATCAC 


TGTCGTTAGT 


2520 


GTACAATTGA 


CTCGACGTAA 


ACACGTTAAA 


TAAAGCCTGG 


ACATATTTAA 


CATCGGGCGT 


2580 


GTTAGCTTTA 


TTAGGCCGAT 


TATCGTCGTC 


GTCCCAACCC 


TCGTCGTTAG 


AAGTTGCTTC 


2640 


CGAAGACGAT 


TTTGCCATAG 


CCACACGACG 


CCTATTAATT 


GTGTCGGCTA 


ACACGTCCGC 


2700 
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GATCAAATTT GTAGTTGAGC TTTTTGGAAT TATTTCTGAT TGCGGGCGTT TTTGGGCGGG 2760 

TTTCAATCTA ACTGTGCCCG ATTTTAATTC AGACAACACG TTAGAAAGCG ATGGTGCAGG 2820 

CGGTGGTAAC ATTTCAGACG GCAAATCTAC TAATGGCGGC GGTGGTGGAG CTGATGATAA 2880 

ATCTACCATC GGTGGAGGCG CAGGCGGGGC TGGCGGCGGA GGCGGAGGCG GAGGTGGTGG 2940 

CGGTGATGCA GACGGCGGTT TAGGCTCAAA TTGTCTCTTT CAGGCAACAC AGTCGGCACC 3 000 

TCAACTATTG TACTGGTTTC GGGCGTATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 3060 

GCATAGTTAA GCCAGCCCCG ACACCCGCCA ACACCCGCTG ACGCGCCCTG ACGGGCTTGT 3120 

CTGCTCCCGG CATCCGCTTA CAGACAAGCT GTGACCGTCT CCGGGAGCTG CATGTGTCAG 3180 

AGGTTTTCAC CGTCATCACC GAAACGCGCG AGACGAAAGG GCGTCGTGAT ACGCCTATTT 3240 

TTATAGGTTA ATGTCATGAT AATAATGGTT TCTTAGACGT CAGGTGGCAC TTTTCGGGGA 3300 

AATGTGCGCG GAACCCCTAT TTGTTTATTT TTCTAAATAC ATTCAAATAT GTATCCGCTC 3360 

ATGAGACAAT AACCCTGATA AATGCTTCAA TAATATTGAA AAAGGAAGAG TATGAGTATT 3420 

CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTGCGGCAT TTTGCCTTCC TGTTTTTGCT 3480 

CACCCAGAAA CGCTGGTGAA AGTAAAAGAT GCTGAAGATC AGTTGGGTGC ACGAGTGGGT 3540 

TACATCGAAC TGGATCTCAA CAGCGGTAAG ATCCTTGAGA GTTTTCGCCC CGAAGAACGT ' 3600 

TTTCCAATGA TGAGCACTTT TAAAGTTCTG CTATGTGGCG CGGTATTATC CCGTATTGAC 3660 

GCCGGGCAAG AGCAACTCGG TCGCCGCATA CACTATTCTC AGAATGACTT GGTTGAGTAC 3720 

TCACCAGTCA CAGAAAAGCA TCTTACGGAT GGCATGACAG TAAGAGAATT ATGCAGTGCT 3780 

GCCATAACCA TGAGTGATAA CACTGCGGCC AACTTACTTC TGACAACGAT CGGAGGACCG 3840 

AAGGAGCTAA CCGCTTTTTT GCACAACATG GGGGATCATG TAACTCGCCT TGATCGTTGG 3900 

GAACCGGAGC TGAATGAAGC CATACCAAAC GACGAGCGTG ACACCACGAT GCCTGTAGCA 3960 

ATGGCAACAA CGTTGCGCAA ACTATTAACT GGCGAACTAC TTACTCTAGC TTCCCGGCAA 4020 

CAATTAATAG ACTGGATGGA GGCGGATAAA GTTGCAGGAC CACTTCTGCG CTCGGCCCTT 4080 

CCGGCTGGCT GGTTTATTGC TGATAAATCT GGAGCCGGTG AGCGTGGGTC TCGCGGTATC 4140 

ATTGCAGCAC TGGGGCCAGA TGGTAAGCCC TCCCGTATCG TAGTTATCTA CACGACGGGG 4200 

AGTCAGGCAA CTATGGATGA ACGAAATAGA CAGATCGCTG AGATAGGTGC CTCACTGATT 4260 

AAGCATTGGT AACTGTCAGA CCAAGTTTAC TCATATATAC TTTAGATTGA TTTAAAACTT 4320 

CATTTTTAAT TTAAAAGGAT CTAGGTGAAG ATCCTTTTTG ATAATCTCAT GACCAAAATC 4380 

CCTTAACGTG AGTTTTCGTT CCACTGAGCG TCAGACCCCG TAGAAAAGAT CAAAGGATCT 4440 



TCTTGAGATC CTTTTTTTCT GCGCGTAATC TGCTGCTTGC AAACAAAAAA ACCACCGCTA 4500 

CCAGCGGTGG TTTGTTTGCC GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC 4 560 

TTCAGCAGAG CGCAGATACC AAATACTGTT CTTCTAGTGT AGCCGTAGTT AGGCCACCAC 4620 

TTCAAGAACT CTGTAGCACC GCCTACATAC CTCGCTCTGC TAATCCTGTT ACCAGTGGCT 4680 

GCTGCCAGTG GCGATAAGTC GTGTCTTAGC GGGTTGGACT CAAGACGATA GTTACCGGAT 4740 

AAGGCGCAGC GGTCGGGCTG AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG 4800 

ACCTACACCG AACTGAGATA CCTACAGCGT GAGCTATGAG AAAGCGCCAC GCTTCCCGAA 4860 

GGGAGAAAGG CGGACAGGTA TCCGGTAAGC GGCAGGGTCG GAACAGGAGA GCGCACGAGG 4920 

GAGCTTCCAG GGGGAAACGC CTGGTATCTT TATAGTCCTG TCGGGTTTCG CCACCTCTGA 4980 

CTTGAGCGTC GATTTTTGTG ATGCTCGTCA GGGGGGCGGA GCCTATGGAA AAACGCCAGC 5040 

AACGCGGCCT TTTTACGGTT CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GTTCTTTCCT 5100 

GCGTTATCCC CTGATTCTGT GGATAACCGT ATTACCGCCT TTGAGTGAGC TGATACCGCT 5160 

CGCCGCAGCC GAACGACCGA GCGC AG CG AG TCAGTGAGCG AGGAAGCATC CTGCACCATC 5220 

GTCTGCTCAT CCATGACCTG ACCATGCAGA GGATGATGCT CGTGACGGTT AACGCCTGGA 5280 

ATCAGCAACG GCTTGCCGTT CAGCAGCAGC AGACCATTTT CAATCCGCAC CTCGCGGAAA 5340 

CCGACATCGC AGGCTTGTGC TTCAATCAGC GTGCCGTCGG CGGTGTGCAG TTCAACCACC 5400 

GCACGATAGA GATTCGGGAT TTCGGCGCTC CACAGTTTCG GGTTTTCGAC GTTCAGACGT 5460 

AGTGTGACGC GATCGGTATA ACCACCACGC TCATCGATAA TTTCACCGCC GAAAGGCGCG 5520 

GTGCCGCTGG CGACCTGCGT TTCACCCTGC CATAAAGAAA CTGTTACCCG TAGGTAGTCA 5580 

CGCAACTCGC CGCACATCTG AACTTCAGCC TCCAGTACAG CGCGG CTGAA ATCATCATTA 5640 

AAGCGAGTGG CAACATGGAA ATCGCTGATT TGTGTAGTCG GTTTATGCAG CAACGAGACG 5700 

TCACGGAAAA TGCCGCTCAT CCGCCACATA TCCTGATCTT CCAGATAACT GCCGTCACTC 5760 

CAACGCAGCA CCATCACCGC GAGGCGGTTT TCTCCGGCGC GTAAAAATGC GCTCAGGTCA 5820 

AATTCAGACG GCAAACGACT GTCCTGGCCG TAACCGACCC AGCGCCCGTT GCACCACAGA 5880 

TGAAACGCCG AGTTAACGCC ATCAAAAATA ATTCGCGTCT GGCCTTCCTG TAGCCAGCTT 5940 

TCATCAACAT T AAATCTG AG CGAGTAACAA CCCGTCGGAT TCTCCGTGGG AACAAACGGC 6000 

GGATTGACCG TAATGGGATA GGTCACGTTG GTGTAGATGG GCGCATCGTA ACCGTGCATC 6 060 

TGCCAGTTTG AGGGGACGAC GACAGTATCG GCCTCAGGAA GATCGCACTC CAGCCAGCTT 6120 



TGCGGCACCG CTTCTGGTGC CGGAAACCAG GCAAAGCGCC ATTCGCCATT CAGGCTGCGC 
AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT TACGCCAGCT GGCGAAAGGG 
GGATGTGCTG CAAGGCGATT AAGTTGGGTA ACGCCAGGGT TTTCCCAGTC ACGACGTTGT 
AAAACGACGG GATCTATCAT TTTTAGCAGT GATTCTAATT GCAGCTGCTC TTTGATACAA 
CTAATTTTAC GACGACGATG CGAGCTTTTA TTCAACCGAG CGTGCATGTT TGCAATCGTG 
CAAGCGTTAT CAATTTTTCA TTATCGTATT GTTGCACATC AACAGGCTGG ACACCACGTT 
GAACTCGCCG CAGTTTTGCG GCAAGTTGGA CCCGCCGCGC ATCCAATGCA AACTTTCCGA 
CATTCTGTTG CCTACGAACG ATTGATTCTT TGTCCATTGA TCGAAGCGAG TGCCTTCGAC 
TTTTTCGTGT CCAGTGTGGC TT 
(2) INFORMATION FOR SEQ ID NO; 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
CCGGATCCGC CCAGGGCCAC CTAAGGAGCG G 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
CTGAATTCAG GAGCCAGGGC ACAGGCATG 
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